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STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS 


IX. THOMAS BAYES’S ESSAY TOWARDS SOLVING A PROBLEM 
IN THE DOCTRINE OF CHANCES* 


[Reproduced with the permission. of the Council of the Royal Society from 
The Philosophical Transactions (1763), 53, 370-418] 


THOMAS BAYES—A BIOGRAPHICAL NOTE 


By G. A. BARNARD 


Imperial College, London 


Bayes’s paper, reproduced in the following pages, must rank as one of the most famous 
memoirs in the history of science and the problem it discusses is still the subject of keen 
controversy. The intellectual stature of Bayes himself is measured by the fact that it is still 
of scientific as well as historical interest to know what Bayes had to say on the questions he 
raised. And yet such are the vagaries of historical records, that almost nothing is known 
about the personal history of the man. The Dictionary of National Biography, compiled at 
the end of the last century, when the whole theory of probability was in temporary eclipse in 
England, has an entry devoted to Bayes’s father, Joshua Bayes, F.R.S., one of the first six 
Nonconformist ministers to be publicly ordained as such in England, but it has nothing on 
his much more distinguished son. Indeed, the note on Thomas Bayes which is to appear in 
the forthcoming new edition of the Encyclopedia Britannica will apparently be the first 
biographical note on Bayes to appear in a work of general reference since the Imperial 
Dictionary of Universal Biography was published in Glasgow in 1865. And in treatises on the 
history of mathematics, such as that of Loria (1933) and Cantor (1908), notice is taken of his 
contributions to probability theory and to mathematical analysis, but biographical details 
are lacking. 

The Reverend Thomas Bayes, F.R.S., author of the first expression in precise, quantita- 
tive form of one of the modes of inductive inference, was born in 1702, the eldest son of 
Ann Bayes and Joshua Bayes, F.R.S. He was educated privately, as was usual with Non- 
conformists at that time, and from the fact that when Thomas was 12 Bernoulli wrote to 
Leibniz that ‘poor de Moivre’ was having to earn a living in London by teaching mathe- 
matics, we are tempted to speculate that Bayes may have learned mathematics from one of 
the founders of the theory of probability. Eventually Thomas was ordained, and began his 
ministry by helping his father, who was at the time stated, minister of the Presbyterian 
meeting house in Leather Lane, off Holborn. Later the son went to minister in Tunbridge 
Wells at the Presbyterian Chapel on Little Mount Sion which had been opened on | August 
1720. It is not known when Bayes went to Tunbridge Wells, but he was not the first to 
minister on Little Mount Sion, and he was certainly there in 1731, when he produced a tract 
entitled ‘Divine Benevolence, or an attempt to prove that the Principle End of the Divine 

* Thomas Baycs’s famous Essay is so often referred to in current statistical literature, but so rarely 
studied because of the difficulty of access, that the Editors have felt justified in reprinting it in the 
Biometrika History of Probability and Statistics series. 
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Providence and Government is the happiness of His Creatures’. The tract was published by 
John Noon and copies are in Dr Williams’s library and the British Museum. The following 
is a quotation: 

{p. 22]: I don’t find (I am sorry to say it) any necessary connection between mere intelligence, 
though ever so great, and the love or approbation of kind and beneficent actions. 

Bayes argued that the principal end of the Deity was the happiness of His creatures, in 
opposition to Balguy and Grove who had, respectively, maintained that the first spring of 
action of the Deity was Rectitude, and Wisdom. 

In 1736 John Noon published a tract entitled ‘An Introduction to the Doctrine of Fluxions, 
and a Defence of the Mathematicians against the objections of the Author of the Analyst’. 
De Morgan (1860) says: ‘This very acute tract is anonymous, but it was always attributed 
to Bayes by the contemporaries who write in the names of the authors as I have seen in 
various copies, and it bears his name in other places.’ The ascription to Bayes is accepted 
also in the British Museum catalogue. 

From the copy in Dr Williams’s library we quote: 


[p. 9]: It is not the business of the Mathematician to dispute whether quantities do in fact ever vary in 
the manner that is supposed, but only whether the notion of their doing so be intelligible; which being 
allowed, he has a right to take it for granted, and then see what deductions he can make from that sup- 
position. It is not the business of a Mathematician to show that a strait line or circle can be drawn, but 
he tells you what he means by these; and if you understand him, you may proceed further with him; and 
it would not be to the purpose to object that there is no such thing in nature as a true strait line or 
perfect circle, for this is none of his concern: he is not inquiring how things are in matter of fact, but 
supposing things to be in a certain way, what are the consequences to be deduced from them; and all that 
is to be demanded of him is, that his suppositions be intelligible, and his inferences just from the sup- 
positions he makes. 

[p. 48]: He [i.e. the Analyst = Bishop Berkeley] represents the disputes and controversies among 
mathematicians as disparaging the evidence of their methods: and, Query 51, he represents Logics and 
Metaphysics as proper to open their eyes, and extricate them from their difficulties. Now were ever two 
things thus put together? If the disputes of the professors of any science disparage the science itself, 
Logics and Metaphysics are much more to be disparaged than Mathematics; why, therefore, if I am half 
blind, must I take for my guide one that can’t see at all? 

[{p. 50]: So far as Mathematics do not tend to make men more sober and rational thinkers, wiser and 
better men, they are only to be considered as an amusement, which ought not to take us off from serious 
business. 


This tract may have had something to do with Bayes’s election, in 1742, to Fellowship of the 
Royal Society, for which his sponsors were Earl Stanhope, Martin Folkes, James Burrow, 
Cromwell Mortimer, and John Eames. 

William Whiston, Newton’s successor in the Lucasian Chair at Cambridge, who was 
expelled from the University for Arianism, notes in his Memoirs (p. 390) that ‘on August the 
24th this year 1746, being Lord’s Day, and St. Bartholomew’s Day, I breakfasted at Mr Bay’s, 
a dissenting Minister at Tunbridge Wells, and a Successor, though not immediate, to 
Mr Humphrey Ditton, and like him a very good mathematician also’. Whiston goes on to 
relate what he said to Bayes, but he gives no indication that Bayes made reply. 

According to Strange (1949) Bayes wished to retire from his ministry as early as 1749, 
when he allowed a group of Independents to bring ministers from London to take services in 
his chapel week by week, except for Easter, 1750, when he refused his pulpit to one of these 
preachers; and in 1752 he was succeeded in his ministry by the Rev. William Johnston, A.M., 
who inherited Bayes’s valuable library. Bayes continued to live in Tunbridge Wells until 
his death on 17 April 1761. His body was taken to be buried, with that of his father, mother, 
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brothers and sisters, in the Bayes and Cotton family vault in Bunhill Fields, the Noncon- 
formist burial ground by Moorgate. This cemetery also contains the grave of Bayes’s friend, 
the Unitarian Rev. Richard Price, author of the Northampton Life Table and object of 
Burke’s oratory and invective in Reflections on the French Revolution, and the graves 
of John Bunyan, Samuel Watts, Daniel Defoe, and many other famous men. 

Bayes’s will, executed on 12 December 1760, shows him to have been a man of substance. 
The bulk of his estate was divided among his brothers, sisters, nephews and cousins, but he 
left £200 equally between ‘John Boy] late preacher at Newington and now at Norwich, and 
Richard Price now I suppose preacher at Newington Green’. He also left ‘To Sarah Jeffery 
daughter of John Jeffery, living with her father at the corner of Fountains Lane near 
Tonbridge Wells, £500, and my watch made by Elliott and all my linen and wearing apparell 
and household stuff.’ 

Apart from the tracts already noted, and the celebrated Essay reproduced here, Bayes 
wrote a letter on Asymptotic Series to John Canton, published in the Philosophical Transac- 
tions of the Royal Society (1763, pp. 269-271). His mathematical work, though small in 
quantity, is of the very highest quality; both his tract on fluxions and his paper on asymp- 
totic series contain thoughts which did not receive as clear expression again until almost 
a century had elapsed. 

Since copies of the volume in which Bayes’s essay first appeared are not rare, and copies of 
a photographic reprint issved by the Department of Agriculture, Washington, D.C., U.S.A., 
are fairly widely dispersed, the view has been taken that in preparing Bayes’s paper for 
publication here some editing is permissible. In particular, the notation has been modernized, 
some of the archaisms have been removed and what seem to be obvious printer’s errors have 
been corrected. Sometimes, when a word has been omitted in the original, a suggestion has 
been supplied, enclosed in square brackets. Otherwise, however, nothing has been changed, 
and we hope that while the present text should in no sense be regarded as definitive, it 
will be easier to read on that account. All the work of preparing the text for the printer was 
most painstakingly and expertly carried out by Mr M. Gilbert, B.Sc., A.R.C.S. Thanks are 
also due to the Royal Society for permission to reproduce the Essay in its present form. 

In writing the biographical notes the present author has had the friendly help of many 
persons, including especially Dr A. Fletcher and Mr R. L. Plackett, of the University of 
Liverpool, Mr J. F. C. Willder, of the Department of Pathology, Guy’s Hospital Medical 
School, and Mr M. E. Ogborn, F.1.A., of the Equitable Life Assurance Society. He would 
also like to thank Sir Ronald Fisher, for some initial prodding which set him moving, and 
Prof. E. 8. Pearson, for patient encouragement to see the matter through to completion. 


G. A. BARNARD 
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AN ESSAY TOWARDS SOLVING A PROBLEM IN THE 
DOCTRINE OF CHANCES 


By THE LATE Rev. Mr BAYES, F.RB.S. 
Communicated by Mr Price, in a Letter to John Canton, A.M., F.RS. 


Read 23 December 1763 

Dear Sir, 

I now send you an essay which I have found among the papers of our deceased friend 
Mr Bayes, and which, in my opinion, has great merit, and well deserves to be preserved. 
Experimental philosophy, you will find, is nearly interested in the subject of it; and on this 
account there seems to be particular reason for thinking that a communication of it to the 
Royal Society cannot be improper. 

He had, you know, the honour of being a member of that illustrious Society, and was much 
esteemed by many in it as a very able mathematician. In an introduction which he has 
writ to this Essay, he says, that his design at first in thinking on the subject of it was, to 
find out a method by which we might judge concerning the probability that an event has to 
happen, in given circumstances, upon supposition that we know nothing concerning it but 
that, under the same circumstances, it has happened a certain number of times, and failed 
a certain other number of times. He adds, that he soon perceived that it would not be very 
difficult to do this, provided some rule could be found according to which we ought to 
estimate the chance that the probability for the happening of an event perfectly unknown, 
should lie between any two named degrees of probability, antecedently to any experiments 
made about it; and that it appeared to him that the rule must be to suppose the chance the 
same that it should lie between any two equidifferent degrees; which, if it were allowed, all 
the rest might be easily calculated in the common method of proceeding in the doctrine of 
chances. Accordingly, I find among his papers a very ingenious solution of this problem in 
this way. But he afterwards considered, that the postulate on which he had argued might not 
perhaps be looked upon by all as reasonable; and therefore he chose to lay down in another 
form the proposition in which he thought the solution of the problem is contained, and in 
a scholiwm to subjoin the reasons why he thought so, rather than to take into his mathe- 
matical reasoning any thing that might admit dispute. This, you will observe, is the method 
which he has pursued in this essay. 

Every judicious person will be sensible that the problem now mentioned is by no means 
merely a curious speculation in the doctrine of chances, but necessary to be solved in order 
to [provide] a sure foundation for all our reasonings concerning past facts, and what is likely 
to be hereafter. Common sense is indeed sufficient to shew us that, from the observation of 
what has in former instances been the consequence of a certain cause or action, one may make 
a judgment what is likely to be the consequence of it another time, and that the larger [the] 
number of experiments we have to support a conclusion, so much the more reason we have 
to take it for granted. But it is certain that we cannot determine, at least not to any nicety, 
in what degree repeated experiments confirm a conclusion, without the particular discussion 
of the beforementioned problem; which, therefore, is necessary to be considered by any 
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one who would give a clear account of the strength of analogical or inductive reasoning; 
concerning, which at present, we seem to know little more than that it does sometimes 
in fact convince us, and at other times not; and that, as it is the means of [a]cquainting 
us with many truths, of which otherwise we must have been ignorant; so it is, in all proba- 
bility, the source of many errors, which perhaps might in some measure be avoided, if the 
force that this sort of reasoning ought to have with us were more distinctly and clearly 
understood. 

These observations prove that the problem enquired after in this essay is no less important 
than it is curious. It may be safely added, I fancy, that it is also a problem that has never 
before been solved. Mr De Moivre, indeed, the great improver of this part of mathematics, 
has in his Laws of Chance,* after Bernoulli, and to a greater degree of exactness, given rules 
to find the probability there is, that if a very great number of trials be made concerning any 
event, the proportion of the number of times it will happen, to the number of times it will fail 
in those trials, should differ less than by small assigned limits from the proportion of the 
probability of its happening to the probability of its failing in one single trial. But I know of 
no person who has shewn how to deduce the solution of the converse problem to this; 
namely, ‘the number of times an unknown event has happened and failed being given, to 
find the chance that the probability of its happening should lie somewhere between any two 
named degrees of probability.’ What Mr De Moivre has done therefore cannot be thought 
sufficient to make the consideration of this point unnecessary: especially, as the rules he has 
given are not pretended to be rigorously exact, except on supposition that the number of 
trials made are infinite; from whence it is not obvious how large the number of trials must be 
in order to make them exact enough to be depended on in practice. 

Mr De Moivre calls the problem he has thus solved, the hardest that can be proposed on the 
subject of chance. His solution he has applied to a very important purpose, and thereby 
shewn that those are much mistaken who have insinuated that the Doctrine of Chances in 
mathematics is of trivial consequence, and cannot have a place in any serious enquiry.+ The 
purpose I mean is, to shew what reason we have for believing that there are in the constitu- 
tion of things fixt laws according to which events happen, and that, therefore, the frame of 
the world must be the effect of the wisdom and power of an intelligent cause; and thus to 
confirm the argument taken from final causes for the existence of the Deity. It will be easy 
to see that the converse problem solved in this essay is more directly applicable to this 
purpose; for it shews us, with distinctness and precision, in every case of any particular order 
or recurrency of events, what reason there is to think that such recurrency or order is derived 
from stable causes or regulations in nature, and not from any of the irregularities of 
chance. 

The two last rules in this essay are given without the deductions of them. I have chosen to 
do this because these deductions, taking up a good deal of room, would swell the essay too 
much; and also because these rules, though of considerable use, do not answer the purpose for 
which they are given as perfectly as could be wished. They are however ready to be produced, 
if a communication of them should be thought proper. I have in some places writ short 
notes, and to the whole I have added an application of the rules in the essay to some 


THOMAS BAYES 


* See Mr De Moivre’s Doctrine of Chances, p. 243, etc. He has omitted the demonstrations of his rules, 
but these have been since supplied by Mr Simpson at the conclusion of his treatise on The Nature and 
Laws of Chance. 

+ See his Doctrine of Chances, p. 252, ete. 
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particular cases, in order to convey a clearer idea of the nature of the problem, and to shew 
how far the solution of it has been carried. 

I am sensible that your time is so much taken up that I cannot reasonably expect that you 
should minutely examine every part of what I now send you. Some of the calculations, 
particularly in the Appendix, no one can make without a good deal of labour. I have taken 
so much care about them, that I believe there can be no material error in any of them; but 
should there be any such errors, I am the only person who ought to be considered as answer- 
able for them. 

Mr Bayes has thought fit to begin his work with a brief demonstration of the general laws 
of chance. His reason for doing this, as he says in his introduction, was not merely that his 
reader might not have the trouble of searching elsewhere for the principles on which he has 
argued, but because he did not know whither to refer him for a clear demonstration of them. 
He has also made an apology for the peculiar definition he has given of the word chance or 
probability. His design herein was to cut off all dispute about the meaning of the word, which 
in common language is used in different senses by persons of different opinions, and according 
as it is applied to past or future facts. But whatever different senses it may have, all (he 
observes) will allow that an expectation depending on the truth of any past fact, or the 
happening of any future event, ought to be estimated so much the more valuable as the fact 
is more likely to be true, or the event more likely to happen. Instead therefore, of the proper 
sense of the word probability, he has given that which all will allow to be its proper measure 
in every case where the word is used. But it is time to conclude this letter. Experimental 
philosophy is indebted to you for several discoveries and improvements; and, therefore, 
I cannot help thinking that there is a peculiar propriety in directing to you the following 
essay and appendix. That your enquiries may be rewarded with many further successes, and 
that you may enjoy every valuable blessing, is the sincere wish of, Sir, 


your very humble servant, 


Newington-Green, Richard Price 
10 November 1763 


PROBLEM 


Given the number of times in which an unknown event has happened and failed: Required 
the chance that the probability of its happening in a single trial lies somewhere between 
any two degrees of probability that can be named. 


Section I 


DEFINITION 1. Several events are inconsistent, when if one of them happens, none of the 
rest can. 

2. Two events are contrary when one, or other of them must; and both together cannot 
happen. 

3. An event is said to fail, when it cannot happen; or, which comes to the same thing, when 
its contrary has happened. 

4. An event is said to be determined when it has either happened or failed. 

5. The probability of any event is the ratio between the value at which an expectation 
depending on the happening of the event ought to be computed, and the value of the thing 
expected upon it’s happening. 
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6. By chance I mean the same as probability. 
7. Events are independent when the happening of any one of them does neither increase 
nor abate the probability of the rest. 


Prop. 1 


When several events are inconsistent the probability of the happening of one or other of 
them is the sum of the probabilities of each of them. 

Suppose there be three such events, and whichever of them happens I am to receive NV, and 
that the probability of the Ist, 2nd, and 3rd are respectively a/N, b/N, c/N. Then (by the 
definition of probability) the value of my expectation from the 1st will be a, from the 2nd b, 
and from the 3rd c. Wherefore the value of my expectations from all three will be a+6+c. 
But the sum of my expectations from all three is in this case an expectation of receiving NV 
upon the happening of one or other of them. Wherefore (by definition 5) the probability of 
one or other of them is (a+b+c¢)/N or a/N +6/N +c/N. The sum of the probabilities of each 
of them. 


CoroLuaRry. If it be certain that one or other of the three events must happen, then 
a+b+c=N. For in this case all the expectations together amounting to a certain expecta- 
tion of receiving N, their values together must be equal to N. And from hence it is plain that 
the probability of an event added to the probability of its failure (or of its contrary) is the 
ratio of equality. For these are two inconsistent events, one of which necessarily happens. 
Wherefore if the probability of an event is P/N that of it’s failure will be (NW —P)/N. 


Prop. 2 


If a person has an expectation depending on the happening of an event, the probability of 
the event is to the probability of its failure as his loss if it fails to his gain if it happens. 

Suppose a person has an expectation of receiving N, depending on an event the proba- 
bility of which is P/N. Then (by definition 5) the value of his expectation is P, and therefore 
if the event fail, he loses that which in value is P; and if it happens he receives N, but his 
expectation ceases. His gain therefore is N — P. Likewise since the probability of the event 
is P/N, that of its failure (by corollary prop. 1) is (N —P)/N. But P/N is to (N — P)/N as Pis 
to N —P, i.e. the probability of the event is to the probability of it’s failure, as his loss if it 
fails to his gain if it happens. 

Prop. 3 


The probability that two subsequent events will both happen is a ratio compounded of the 
probability of the Ist, and the probability of the 2nd on supposition the 1st happens. 

Suppose that, if both events happen, I am to receive NV, that the probability both will 
happen is P/N, that the 1st will is a/N (and consequently that the Ist will not is (NW —a)/N) 
and that the 2nd will happen upon supposition the Ist does is b/N. Then (by definition 5) 
P will be the value of my expectation, which will become b if the 1st happens. Consequently 
if the 1st happens, my gain by it is b— P, and if it fails my loss is P. Wherefore, by the fore- 
going proposition, a/N is to (N —a)/N, i.e. a is to N—a as P is to b—P. Wherefore (com- 
ponendo inverse) aisto N as Pistob. But the ratio of P to N is compounded of the ratio of P 
to b, and that of 6 to N. Wherefore the same ratio of P to N is compounded of the ratio of 
ato N and that of b to N, i.e. the probability that the two subsequent events will both happen 
is compounded of the probability of the Ist and the probability of the 2nd on supposition the 
ist happens. 
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Coro.iary. Hence if of two subsequent events the probability of the Ist be a/N, and the 
probability of both together be P/N, then the probability of the 2nd on supposition the 
lst happens is P/a. 

Prop. 4 


If there be two subsequent events to be determined every day, and each day the probability 
of the 2nd is b/N and the probability of both P/N, and I am to receive N if both the events 
happen the first day on which the 2nd does; I say, according to these cunditions, the proba- 
bility of my obtaining N is P/b. For if not, let the probability of my obtaining N be a/N and 
let y be tox as N —bto N. Then since z/N is the probability of my obtaining N (by definition 1) 
x is the value of my expectation. And again, because according to the foregoing conditions the 
first day I have an expectation of obtaining N depending on the happening of both the events 
together, the probability of which is P/N, the value of this expectation is P. Likewise, if this 
coincident should not happen I have an expectation of being reinstated in my former 
circumstances, i.e. of receiving that which in value is x depending on the failure of the 2nd 
event the probability of which (by cor. prop. 1) is (W —b)/N or y/x, because y is to x as N —b 
to NV. Wherefore since z is the thing expected and y/x the probability of obtaining it, the value 
of this expectation is y. But these two last expectations together are evidently the same with 
my original expectation, the value of which is x, and therefore P+y = x. But y is to x as 
N—bis to N. Wherefore z is to P as N is to b, and /N (the probability of my obtaining V) 
is P/b. 


Cor. Suppose after the expectation given me in the foregoing proposition, and before it is 
at all known whether the Ist event has happened or not, I should find that the 2nd event has 
happened; from hence I can only infer that the event is determined on which my expectation 
depended, and have no reason to esteem the value of my expectation either greater or less 
than it was before. For if I have reason to think it less, it would be reasonable for me to give 
something to be reinstated in my former circumstances, and this over and over again as often 
as I should be informed that the 2nd event had happened, which is evidently absurd. And 
the like absurdity plainly follows if you say I ought to set a greater value on my expectation 
than before, for then it would be reasonable for me to refuse something if offered me upon 
condition I would relinquish it, and be reinstated in my former circumstances; and this 
likewise over and over again as often as (nothing being known concerning the Ist event) it 
should appear that the 2nd had happened. Notwithstanding therefore this discovery that 
the 2nd event has happened, my expectation ought to be esteemed the same in value as 
before, i.e. x, and consequently the probability of my obtaining N is (by definition 5) still 
x/N or P/b.* But after this discovery the probability of my obtaining N is the probability 
that the lst of two subsequent events has happened upon the supposition that the 2nd has, 
whose probabilities were as before specified. But the probability that an event has happened 
is the same as the probability I have to guess right if I guess it has happened. Wherefore the 
following proposition is evident. 


* What is here said may perhaps be a little illustrated by considering that all that can be lost by the 
happening of the 2nd event is the chance I should have had of being reinstated in my former circum- 
stances, if the event on which my expectation depended had been determined in the manner expressed in 
the proposition. But this chance is always as much against me as it is for me. If the 1st event happens, 
it is against me, and equal to the chance for the 2nd event's failing. If the Ist event does not happen, 
it is for me, and equal also to the chance for the 2nd event's failing. The loss of it, therefore, can be no 
disadvantage. 
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Prop. 5 


If there be two subsequent events, the probability of the 2nd b/N and the probability of 
both together P/N, and it being first discovered that the 2nd event has happened, from hence 
I guess that the 1st event has also happened, the probability I am in the right is P/b.* 


Prop. 6 


The probability that several independent events shall all happen is a ratio compounded of 
the probabilities of each. 

For from the nature of independent events, the probability that any one happens is not 
altered by the happening or failing of any of the rest, and consequently the probability that 
the 2nd event happens on supposition the 1st does is the same with its original probability ; 
but the probability that any two events happen is a ratio compounded of the probability 
of the Ist event, and the probability of the 2nd on supposition the Ist happens by prop. 3. 
Wherefore the probability that any two independent events both happen is a ratio com- 
pounded of the probability of the 1st and the probability of the 2nd. And in like manner 
considering the Ist and 2nd events together as one event; the probability that three 
independent events all happen is a ratio compounded of the probability that the two Ist 
both happen and the probability of the 3rd. And thus you may proceed if there be ever so 
many such events; from whence the proposition is manifest. 


Cor. 1. If there be several independent events, the probability that the 1st happens the 
2nd fails, the 3rd fails and the 4th happens, etc. is a ratio compounded of the probability of 
the lst, and the probability of the failure of the 2nd, and the probability of the failure of the 
3rd, and the probability of the 4th, etc. For the failure of an event may always be considered 
as the happening of its contrary. 


Cor. 2. If there be several independent events, and the probability of each one be a, and 
that of its failing be b, the probability that the Ist happens and the 2nd fails, and the 3rd fails 
and the 4th happens, ete. will be abba, etc. For, according to the algebraic way of notation, 
if a denote any ratio and 6 another, abba denotes the ratio compounded of the ratios a, b, b, a. 
This corollary therefore is only a particular case of the foregoing. 


DerinitTion. If in consequence of certain data there arises a probability that a certain 
event should happen, its happening or failing, in consequence of these data, I cal: it’s hap- 
pening or failing in the 1st trial. And if the same data be again repeated, the happening or 
failing of the event in consequence of them I call its happening or failing in the 2nd trial; and 
so on as often as the same data are repeated. And hence it is manifest that the happening or 
failing of the same event in so many diffe[rent] trials, is in reality the happening or failing of 
so many distinct independent events exactly similar to each other. 


* What is proved by Mr Bayes in this and the preceding proposition is the same with the answer to the 
following question. What is the probability that a certain event, when it happens, will be accompanied 
with another to be determined at the same time? In this case, as one of the events is given, nothing can 
be due for the expectation of it; and, consequently, the value of an expectation depending on the hap- 
pening of both events must be the same with the value of an expectation depending on the happening of 
one of them. In other words; the probability that, when one of two events happens, the other will, is the 
same with the probability of this other. Call x then the probability of this other, and if b/N be the pro- 
bability of the given event, and p/N the probability of both, because p/N = (b/N) x x, x = p/b = the 
probability mentioned in these propositions. 
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Prop. 7 


If the probability of an event be a, and that of its failure be 6 in each single trial, the 
probability of its happening p times, and failing ¢ times in p+q trials is Hab? if H be the 
coefficient of the term in which occurs a”b% when the binomial (a + b)?+4 is expanded. 

For the happening or failing of an event in different trials are so many independent events. 
Wherefore (by cor. 2 prop. 6) the probability that the event happens the Ist trial, fails the 
2nd and 3rd, and happens the 4th, fails the 5th, etc. (thus happening and failing till the 
number of times it happens be p and the number it fails be qg) is abbab etc. till the number of 
a’s be p and the number of b’s be q, that is; *tis a’b%. In like manner if you consider the event 
as happening p times and failing g times in any other particular order, the probability for it is 
a’b?; but the number of different orders according to which an event may happen or fail, so 
as in all to happen p times and fail q, in p +q trials is equal to the number of permutations 
that aaaa bbb admit of when the number of a’s is p, and the number of b’s is g. And this 
number is equal to EH, the coefficient of the term in which occurs a?b% when (a+ b)?*+4 is 
expanded. The event therefore may happen p times and fail g in p + q trials EF different ways 
and no more, and its happening and failing these several different ways are so many incon- 
sistent events, the probability for each of which is a”b2, and therefore by prop. 1 the proba- 
bility that some way or other it happens p times and fails q times in p+ q trials is Ha”b?. 


Section ITI 


PosTULATE. |. I suppose the square table or plane ABCD to be so made and levelled, that if 
either of the balls o or W be thrown upon it, there shall be the same probability that it rests 
upon any one equal part of the plane as another, and that it must necessarily rest somewhere 
upon it. 

2. I suppose that the ball W shall be first thrown, and through the point where it rests 
a line os shall be drawn parallel to AD, and meeting CD and AB in s and 0; and that after- 
wards the ball O shall be thrown p + q or n times, and that its resting between AD and os after 
a single throw be called the happening of the event / in a single trial. These things supposed: 


Lem. 1. The probability that the point o will fall 
between any two points in the line ABis the ratio of the ¢ + ae ee ee 
distance between the two points to the whole line AB. . : 

Let any two points be named, as f and b in the line AB, 
and through them parallel to AD draw fF, bL meeting a ia : 
CD in F and L. Then if the rectangles Cf, Fb, LA are : 
commensurable to each other, they may each be divided oa 
into the same equal parts, which being done, and the 
ball W thrown, the probability it will rest somewhere 
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AC are so many inconsistent events; and this sum, - ae 
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parts. Consequently, the probability there is that the ball W should rest somewhere upon Fb 
is the probability it has to rest upon one equal part multiplied by the number of equal parts 
in Fb; and the probability it rests somewhere upon Cf or LA, i.e. that it does not rest upon Fb 
(because it must rest somewhere upon AC) is the probability it rests upon one equal part 
multiplied by the number of equal parts in Cf, LA taken together. Wherefore, the probability 
it rests upon Fb is to the probability it does not as the number of equal parts in Fb is to the 
number of equal parts in Cf, LA together, or as Fb to Cf, LA together, or as fb to Bf, Ab 
together. Wherefore the probability it rests upon Fb is to the probability it does not as fb to 
Bf, Ab together. And (componendo inverse) the probability it rests upon Fb is to the proba- 
bility it rests upon F’b added to the probability it does not, as fb to AB, or as the ratio of fb to 
AB to the ratio of AB to AB. But the probability of any event added to the probability of its 
failure is the ratio of equality; wherefore, the probability it rests upon Fb is to the ratio of 
equality as the ratio of fb to AB to the ratio of AB to AB, or the ratio of equality; and there- 
fore the probability it rests upon Fb is the ratio of fb to AB. But ex hypothesi according as 
the ball W falls upon Fb or not the point o will lie between f and 6 or not, and therefore the 
probability the point o will lie between f and b is the ratio of fb to AB. 

Again; if the rectangles Cf, Fb, LA are not commensurable, yet the last mentioned 
probability can be neither greater nor less than the ratio of fb to AB; for, if it be less, let it 
be the ratio of fe to AB, and upon the line fb take the points p and t, so that pt shall be greater 
than fe, and the three lines Bp, pt, tA commensurable (which it is evident may be always 
done by dividing AB into equal parts less than half cb, and taking p and ¢ the nearest points 
of division to f and c that lie upon fb). Then because Bp, pt, tA are commensurable, so are the 
rectangles Cp, Dt, and that upon pt compleating the square AB. Wherefore, by what has 
been said, the probability that the point o will lie between p and ¢ is the ratio of pt to AB. 
But if it lies between p and ¢ it must lie between f and b. Wherefore, the probability it should 
lie between f and 6 cannot be less than the ratio of pt to AB, and therefore must be greater 
than the ratio of fc to AB (since pt is greater than fc). And after the same manner you may 
prove that the forementioned probability cannot be greater than the ratio of fb to AB, it 
must therefore be the same. 


Lem. 2. The ball W having been thrown, and the line os drawn, the probability of the 
event M in a single trial is the ratio of Ao to AB. 

For, in the same manner as in the foregoing lemma, the probability that the ball o being 
thrown shall rest somewhere upon Do or between AD and so is the ratio of Aoto AB. But the 
resting of the ball o between AD and so after a single throw is the happening of the event M 
in a single trial. Wherefore the lemma is manifest. 


Prop. 8 

If upon BA you erect the figure BghikmA whose property is this, that (the base BA being 
divided into any two parts, as Ab, and Bb and at the point of division b a perpendicular being 
erected and terminated by the figure in m; and y, x, r representing respectively the ratio of 
bm, Ab, and Bb to AB, and E being the coefficient of the term in which occurs a?b¢ when the 
binomial (a + 6)”+¢is expanded) y = Hx?r2, I say that before the ball W is thrown, the proba- 
bility the point o should fall between f and b, any two points named in the line AB, and withall 
that the event M should happen p times and fail g in p + q trials, is the ratio of fghikmb, the 
part of the figure BghikmA intercepted between the perpendiculars fg, bm raised upon the 
line AB, to CA the square upon AB. 
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DEMONSTRATION 


For if not; first let it be the ratio of D a figure greater than fghikmb to CA, and through the 
points e,d,c draw perpendiculars to fb meeting the curve AmigB in h, i, k; the point d being 
so placed that di shall be the longest of the perpendiculars terminated by the line fb, and the 
curve AmigB; and the points e, d, c being so many and so placed that the rectangles, bk, ci, 
et, fh taken together shall differ less from fghikmb than D does; all which may be easily done 
by the help of the equation of the curve, and the difference between D and the figure fghikmb 
given. Then since di is the longest of the perpendicular ordinates that insist upon fb, the rest 
will gradually decrease as they are farther and farther from it on each side, as appears from 
the construction of the figure, and consequently eh is greater than gf or any other ordinate 
that insists upon ef. 
Now if Ao were equal to Ae, then by lem. 2 the probability of the event M in a single trial 
would be the ratio of Ae to AB, and consequently by cor. Prop. 1 the probability of it’s 
failure would be the ratio of Be to AB. Wherefore, if x and r be the two forementioned ratios 
respectively, by Prop. 7 the probability of the event M happening p times and failing q in 
p+q trials would be Ex?r?, But x and r being respectively the ratios of Ae to AB and Be to 
AB, if yis the ratio of eh to AB, then, by construction of the figure AiB, y = Ex?r?. Wherefore, 
if Ao were equal to Ae the probability of the event M happening p times and failing q in 
p+qtrials would be y, or the ratio of eh to AB. And if Ao were equal to Af, or were any mean 
between Ae and Af, the last mentioned probability for the same reasons would be the ratio of 
fg or some other of the ordinates insisting upon ef, to AB. But eh is the greatest of all the 
ordinates that insist upon ef. Wherefore, upon supposition the point should lie anywhere 
between f and e, the probability that the event M/ happens p times and fails q in p + q trials 
cannot be greater than the ratio of eh to AB. There then being these two subsequent events, 
the Ist that the point o will lie between e and f, the 2nd that the event M will happen 
p times and fail q in p+q trials, and the probability of the first (by lemma 1) is the ratio 
of ef to AB, and upon supposition the Ist happens, by what has been now proved, the 
probability of the 2nd cannot be greater than the ratio of eh to AB, it evidently follows (from 
Prop. 3) that the probability both together will happen cannot be greater than the ratio 
compounded of that of ef to AB and that of eh to AB, which compound ratio is the ratio of 
fhto CA. Wherefore, the probability that the point o will lie between f and e, and the event 
M happen p times and fail g, is not greater than the ratio of fh to CA. And in like manner 
the probability the point o will lie between e and d, and the event M happen and fail as before, 
cannot be greater than the ratio of ec to CA. And again, the probability the point o will lie 
between d and c, and the event M happen and fail as before, cannot be greater than the 
ratio of ci to CA. And lastly, the probability that the point o will lie between c and b, and the 
event M happen and fail as before, cannot be greater than the ratio of bk to CA. Add now 
all these several probabilities together, and their sum (by Prop. 1) will be the probability 
that the point will lie somewhere between f and b, and the event M happen p times and fail 
qin p+q trials. Add likewise the correspondent ratios together, and their sum will be the 
ratio of the sum of the antecedents to their common consequent, i.e. the ratio of fh, ei, ci, bk 
together to C’'A; which ratio is less than that of D to CA, because D is greater than fh, ei, ci, bk 
together. And therefore, the probability that the point o will lie between f and b, and withal 
that the event M will happen p times and fail q in p + q trials, is less than the ratio of Dto CA; 
but it was supposed the same which is absurd. And in like manner, by inscribing rectangles 
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within the figure, as eg, dh, dk, em, you may prove that the last mentioned probability is 
greater than the ratio of any figure less than fghikmb to CA. 
Wherefore, that probability must be the ratio of fghikmb to CA. 


Cor. Before the ball W is thrown the probability that the point o will lie somewhere 
between A and B, or somewhere upon the line AB, and withal that the event VM will happen 
p times, and fail q in p +4 trials is the ratio of the whole figure AiB to CA. But it is certain 
that the point o will lie somewhere upon AB. Wherefore, before the ball W is thrown the 
probability the event M will happen p times and fail ¢ in p + q trials is the ratio of AiB to CA. 


Prop. 9 


If before anything is discovered concerning the place of the point 0, it should appear that 
the event M had happened » times and failed q in p + q trials, and from hence I guess that the 
point o lies between any two points in the line 4B, as f and b, and consequently that the 
probability of the event M in a single trial was somewhere between the ratio of Ab to AB and 
that of Af to AB: the probability I am in the right is the ratio of that part of the figure AiB 
described as before which is intercepted between perpendiculars erected upon AB at the 
points f and b, to the whole figure AiB. 

For, there being these two subsequent events, the first that the point o will lie between 
f and b; the second that the event M should happen p times and fail ¢ in p + q trials; and (by 
cor. prop. 8) the original probability of the second is the ratio of AiB to CA, and (by prop. 8) 
the probability of both is the ratio of fghimb to CA; wherefore (by prop. 5) it being first 
discovered that the second has happened, and from hence I guess that the first has happened 
also, the probability I am in the right is the ratio of fghimb to AiB, the point which was to 
be proved. 


Cor. The same things supposed, if I guess that the probability of the event M lies some- 
where between 0 and the ratio of Ab to AB, my chance to be in the right is the ratio of 
Abm to AiB. 


Scholium 


From the preceding proposition it is plain, that in the case of such an event as I there call M, 
from the number of times it happens and fails in a certain number of trials, without knowing 
anything more concerning it, one may give a guess whereabouts it’s probability is, and, by 
the usual methods computing the magnitudes of the areas there mentioned, see the chance 
that the guess is right. And that the same rule is the proper one to be used in the case of an 
event concerning the probability of which we absolutely know nothing antecedently to any 
trials made concerning it, seems to appear from the following consideration; viz. that 
concerning such an event I have no reason to think that, in a certain number of trials, it 
should rather happen any one possible number of times than another. For, on this account, 
I may justly reason concerning it as if its probability had been at first unfixed, and then 
determined in such a manner as to give me no reason to think that, in a certain number of 
trials, it should rather happen any one possible number of times than another. But this is 
exactly the case of the event M. For before the ball W is thrown, which determines it’s 
probability in a single trial (by cor. prop. 8), the probability it has to happen p times and fail 
qin p+ q or n trials is the ratio of AiB to CA, which ratio is the same when p + q or n is given, 
whatever number p is; as will appear by computing the magnitude of AiB by the method 
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of fluxions.* And consequently before the place of the point o is discovered or the number of 
times the event M has happened in » trials, I can have no reason to think it should rather 
happen one possible number of times than another. 

In what follows therefore I shall take for granted that the rule given concerning the event 
M in prop. 9 is also the rule to be used in relation to any event concerning the probability of 
which nothing at all is known antecedently to any trials made or observed concerning it. 
And such an event I shall call an unknown event. 


Cor. Hence, by supposing the ordinates in the figure AiB to be contracted in the ratio of 
E to one, which makes no alteration in the proportion of the parts of the figure intercepted 
between them, and applying what is said of the event M to an unknown event, we have the 
following proposition, which gives the rules for finding the probability of an event from the 
number of times it actually happens and fails. 


Prop. 10 


If a figure be described upon any base AH (Vid. Fig.) having for it’s equation y = x”r?, 
where y, x, r are respectively the ratios of an ordinate of the figure insisting on the base at 
right angles, of the segment of the base intercepted between the 
ordinate and A the beginning of the base, and of the other seg- 
ment of the base lying between the ordinate and the point H, to 
the base as their common consequent. I say then that if an 
unknown event has happened p times and failed q in p + q trials, 
and in the base AH taking any two points as f and ¢ you erect r D 
the ordinates fC, tF at right angles with it, the chance that the 
probability of the event lies somewhere between the ratio of Af 
to AH and that of At to AH, is the ratio of tFCf, that part of t f 
the before-described figure which is intercepted between the two 
ordinates, to ACFH the whole figure insisting on the base AH. 

This is evident from prop. 9 and the remarks made in the foregoing scholium and corollary. 

Now, in order to reduce the foregoing rule to practice, we must find the value of the 
area of the figure described and the several parts of it separated, by ordinates perpendicular 
to its base. For which purpose, suppose AH = 1 and HO the square upon AH likewise = 1, 
and Cf will be = y, and Af = x, and Hf = r, because y, x and r denote the ratios of Cf, Af, 
and Hf respectively to AH. And by the equation of the curve y = x?r% and (because 
Af+fH = AH)r+«x = 1. Wherefore 

y = x?(1—2)4 
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Now the abscisse being x and the ordinate x? the correspondent area is x?+1/(p+ 1) (by 
prop. 10, cas. 1, Quadrat. Newt.)} and the ordinate being gx?*! the area is gx?+?/(p + 2); and 


* It will be proved presently in art. 4 by computing in the method here mentioned that AiB contracted 
in the ratio of EZ to 1 is to CA as 1 to (n+1)E: from whence it plainly follows that, antecedentiy to this 
contraction, AiB must be to CA in the ratio of 1 to n+1, which is a constant ratio when 7 is given, 
whatever p is. 

+ ’Tis very evident here, without having recourse to Sir Isaac Newton, that the fluxion of the area 
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in like manner of the rest. Wherefore, the abscisse being x and the ordinate y or 
x? —qu*1+ etc. the correspondent area is 
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ae Fe aD e _aa—1) (q—2) 20" 
ptl p+2 2(p+3) 2.3(p+ 4) 


Wherefore, if x = Af = Af/(AH), and y = Cf = Cf/(AH), then 





+etc. 
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From which equation, if g be a small number, it is easy to find the value of the ratio of 
ACf to HO and in like manner as that was found out, it will appear that the ratio of HCf 
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which series will consist of few terms and therefore is to be used when p is small. 
2. The same things supposed as before, the ratio of ACf to HO is 
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where n = p+q. For this series is the same with x?*1/(p + 1) —qu?*t?/(p+2)+ etc. set down 
in Art. lst as the value of the ratio of ACf to HO; as will easily be seen by putting in the former 
instead of r its value 1—wx, and expanding the terms and ordering them according to the 
powers of x. Or, more readily, by comparing the fluxions of the two series, and in the former 
instead of * substituting — #.* 


3. In like manner, the ratio of HCf to HO is 
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* The fluxion of the first series is 
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or, substituting — 2% for 7, 
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which, as all the terms after the first destroy one another, is equal to 
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= the fluxion of the latter series, or of +ete. 


The two series therefore are the same. 
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4. If EZ be the coefficient of that term of the binomial (a + b)?+% expanded in which occurs 
ab, the ratio of the whole figure ACFH to HO is {(n + 1) E}-1, n being = p+q. For, when 
Af = AH,x = 1,r = 0. Wherefore, all the terms of the series set down in Art. 2 as expressing 
the ratio of ACf to HO will vanish except the last, and that becomes 


q(q—1)...1 
(n+1)(p+1)(p+2)...n° 





But Z being the coefficient of that term in the binomial (a + b)” expanded in which occurs 


iid ans (p+1)(p+2)... 

q(q—1)...1 
And, because Af is supposed to become = AH, ACf = ACH. From whence this article is 
plain. 


5. The ratio of ACf to the whole figure ACFH is (by Art. 1 and 4) 
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and if, as 2 expresses the ratio of Af to AH, X should express the ratio of At to AH; the 
ratio of AFt to ACFH would be 
Xpti qxPr? q(q a 1) XPt+3 
1) £\|\— —- ——_ + s—_—__- 
allie [er p+2' 2AXp+3) © 
and consequently the ratio of tFCf to ACFH is (n+1)£ multiplied into the difference 
between the two series. Compare this with prop. 10 and we shall have the following practical 


rule. 
Rule 1 


If nothing is known concerning an event but that it has happened p times and failed q in 
p+qor v trials, and from hence I guess that the probability of its happening in a single trial 
lies somewhere between any two degrees of probability as X and 2, the chance I am in the 
right in my guess is (n+ 1) E multiplied into the difference between the series 

Xpti1 qXPrt? q(q- 1) XP+8 


pti pti S43) 


and the series GEA, tus + ioe il — ete. 
p+l p+2 2AXp+3) 
E being the coefficient of ab? when (a +b)" is expanded. 

This is the proper rule to be used when q is a small number; but if q is large and p small, 
change everywhere in the series here set down p into q and q into p and z into r or (1—2), 
and X into R = (1—X); which will not make any alteration in the difference between the 
two series. 

Thus far Mr Bayes’s essay. 

With respect to the rule here given, it is further to be observed, that when both p and q are 
very large numbers, it will not be possible to apply it to practice on account of the multitude 
of terms which the series in it will contain. Mr Bayes, therefore, by an investigation which it 
would be too tedious to give here, has deduced from this rule another, which is as follows. 








an 


an 





ul 


le 








Tuomas Bayes 309 


Rule 2 


If nothing is known concerning an event but that it has happened p times and failed q in 
p+qor n trials, and from hence I guess that the probability of its happening in a single trial 
lies between (p/n) +z and (p/n) —z; if m? = n3/(pq), a = p/n, b = q/n, E the coefficient of the 
term in which occurs a?b% when (a +6)” is expanded, and 


5 CDV 2PM) prope 
nln 


multiplied by the series 


mz3 (n—2)m5z5 (n—2)(n—4) mz? (n—2)(n—4) (n—6) m2? 
i 2a ee 2n.3n.4n.9 — 








my chance to be in the right is greater than 


5 2x . 
1+ 2HaPb? + 2Ea”b4/n 





2 


and less than = ae Ea?bijn ' 





and if p = q my chance is 22 exactly. 

In order to render this rule fit for use in all cases it is only necessary to know how to find 
within sufficient nearness the value of Ha”b? and also of the series mz— 4m*z + etc. With 
respect to the former Mr Bayes has proved that, supposing K to signify the ratio of the 
quadrantal are to its radius, Ha?b? will be equal to },/n/,/(Kpq) multiplied by the ratio, 
[h], whose hyberbolic logarithm is 


ry? ft Ff 1 1 uf =| 
12|n p q| 360|n? p® @@ 
1 , «8 oe: (eee Aw 1 : i | 1 ete.t 
T 7360 n> p> | 1680 n? p? gg?) 1188|n® p® gq? . 


* In Mr Bayes’s manuscript this chance is made to be greater than 2%/(1+2Ha?b‘) and less than 
22/(1—2Hab*). The third term in the two divisors, as I have given them, being omitted. But this being 
evidently owing to a small oversight in the deduction of this rule, which I have reason to think Mr Bayes 
had himself discovered, I have ventured to correct his copy, and to give the rule as I am satisfied it 
ought to be given. 

+t A very few terms of this series will generally give the hyperbolic logarithm to a sufficient degree of 
exactness. A similar series has been given by Mr DeMoivre, Mr Simpson and other eminent mathe- 
maticians in an expression for the sum of the logarithms of the numbers 1, 2, 3, 4, 5, to x, which sum they 
have asserted to be equal to 


1 1 
slogce+(x+4)loga—a+ ete. 


1 
12x 36023 * 126025 





¢c denoting the circumference of a circle whose radius is unity. But Mr Bayes, in a preceding paper in this 
volume, has demonstrated that, though this expression will very nearly approach to the value of this 
sum when only a proper number of the first terms is taken, the whole series cannot express any quantity 
at all, because, let 2 be what it will, there will always be a part of the series where it will begin to diverge. 
This observation, though it does not much affect the use of this series, seems well worth the notice of 
mathematicians. 
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where the numeral coefficients may be found in the following manner. Call them 
A,B,C, D,E ete. Then 
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where the coefficients of B,C, D, E, F, etc. in the values of D, HE, F, etc. are the 2, 3, 4, ete. 
highest coefficients in (a+b)’, (a+6)®, (a+6)"', ete. expanded; affixing in every particular 
value the least of these coefficients to B, the next in magnitude to the furthest letter from B, 
the next to C, the next to the furthest but one, the next to D, the next to the furthest but 
two, and so on.* 

With respect to the value of the series 


n — 2)miz5 
mz — jie’ + | = etc 


he has observed that it may be calculated directly when mz is less than 1, or even not greater 
than ,/3: but when mz is much larger it becomes impracticable to do this; in which case he 
shews a way of easily finding two values of it very nearly equal between which its true value 
must lie. 

The theorem he gives for this purpose is as follows. 

Let K, as before, stand for the ratio of the quadrantal arc to its radius, and H for the ratio 
whose hyperbolic logarithm is 


22] 24-1] 2-1 98-1 
In 360n3* 1260n®  1680n? 





+ ete. 


Then the series mz — 4mz + etc. will be greater or less than the series 
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etc. 





continued to any number of terms, according as the last term has a positive or a negative 
sign before it. 
From substituting these values of Za?b% and 








in the second rule arises a third rule, which is the rule to be used when mz is of some con- 
siderable magnitude. 


* This method of finding these coefficients I have deduced from the demonstration of the third lemma 
at the end of Mr Simpson’s T'reatise on the Nature ond Laws of Chance. 
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Rule 3 


If nothing is known of an event but that it has happened p times and failed q in p+q or 
n trials, and from hence I judge that the probability of its happening in a single trial lies 
between p/n +z and p/n—z my chance to be right is greater than 


4,/(Kpq)h oH J2(n+1)(1- ei 


\(Kpq) + hnt + hn JK (n+2) mz 





and less than 


(Kpq) —hnt — hn-*\" VK (n+ 2) mz ALK (n +2) (n +4) 2323 | 





EV(Kpg)h fa py_ V2 (+1) (1—2m%e?/n)brt 2m (m+ 1) (1 2m*2?/m)br? 








where m*, K,h and H stand for the quantities already explained. 


AN APPENDIX 


Containing an application of the foregoing Rules to some particular Cases 


The first rule gives a direct and perfect solution in all cases; and the two following rules are only particular 
methods of approximating to the solution given in the first rule, when the labour of applying it becomes 
too great. 

The first rule may be used in all cases where either p or g are nothing or not large. The second rule may 
be used in all cases where mz is less than ,/3; and the third in all cases where m?2z? is greater than 1 and less 
than 4n, if n is an even number and very large. If n is not large this last rule cannot be much wanted, 
because, m decreasing continually as n is diminished, the value of z may in this case be taken large, (and 
therefore a considerable interval had between p/n —z and p/n +z), and yet the operation be carried on by 
the second rule; or mz not exceed ,/3. 

But in order to shew distinctly and fully the nature of the present problem, and how far Mr Bayes has 
carried the solution of it; I shall give the result of this solution in a few cases, beginning with the lowest 
and most simple. 

Let us then first suppose, of such an event as that called M in the essay, or an event about the proba- 
bility of which, antecedently to trials, we know nothing, thai it has happened once, and that it is enquired 
what conclusion we may draw from hence with respect to the probability of it’s happening on a second 
trial. 

The answer is that there would be an odds of three to one for somewhat more than an even chance that 
it would happen on a second trial. 

For in this case, and in all others where q is nothing, the expression 


Xpt1 gPprtl 


n+1 -— 
( "p+ pti 





or X Pt —_ gett 


gives the solution, as will appear from considering the first rule. Put therefore in this expression 
p+1=2,X = landz = fandit will be 1 — (4)? or 3; which shews the chance there is that the probability 
of an event that has happened once lies somewhere between 1 and 3}; or (which is the same) the odds that 
it is somewhat more than an even chance that it will happen on a second trial.* 

In the same manner it will appear that if the event has happened twice, the odds now mentioned will be 
seven to one; if thrice, fifteen to one; and in general, if the event has happened p times, there will be 
an odds of 2?+1— 1 to one, for more than an equal chance that it will happen on further trials. 

Again, suppose all I know of an event to be that it has happened ten times without failing, and the 
enquiry to be what reason we shall have to think we are right if we guess that the probability of it’s 
happening in a single trial lies somewhere between }$ and %, or that the ratio of the causes of it’s 
happening to those of it’s failure is some ratio between that of sixteen to one and two to one. 

Here p+1 = 11, X = }$andz = $and X+1—yz?+! = (}$)11— (2)! = 0-5013 etc. The answer therefore 
is, that we shall have very nearly an equal chance for being right. 


* There can, I suppose, be no reason for observing that on this subject unity is always made to stand 
for certainty, and } for an even chance. 
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In this manner we may determine in any case what conclusion we ought to draw from a given number 
of experiments which are unopposed by contrary experiments. Every one sees in general that there is 
reason to expect an event with more or less confidence according to the greater or less number of times in 
which, under given circumstances, it has happened without failing; but we here see exactly what this 
reason is, on what principles it is founded, and how we ought to regulate our expectations. 

But it will be proper to dwell longer on this head. 

Suppose a solid or die of whose number of sides and constitution we know nothing; and that we are to 
judge of these from experiments made in throwing it. 

In this case, it should be observed, that it would be in the highest degree improbable that the solid 
should, in the first trial, turn any one side which could be assigned beforehand; because it would be 
known that some side it must turn, and that there was an infinity of other sides, or sides otherwise 
marked, which it was equally likely that it should turn. The first throw only shews that it has the side 
then thrown, without giving any reason to think that it has it any one number of times rather than any 
other. It will appear, therefore, that after the first throw and not before, we should be in the circumstances 
requirea by the conditions of the present problem, and that the whole effect of this throw would be to 
bring us into these circumstances. That is: the turning the side first thrown in any subsequent single trial 
would be an event about the probability or improbability of which we could form no judgment, and of 
which we should know no more than that it lay somewhere between nothing and certainty. With the 
second trial then our calculations must begin; and if in that trial the supposed solid turns again the same 
side, there will arise the probability of three to one that it has more of that sort of sides than of all others; 
or (which comes to the same) that there is somewhat in its constitution disposing it to turn that side 
oftenest: And this probability will increase, in the manner already explained, with the number of times in 
which that side has been thrown without failing. It should not, however, be imagined that any number 
of such experiments can give sufficient reason for thinking that it would never turn any other side. For, 
suppose it has turned the same side in every trial a million of times. In these circumstances there would 
be an improbability that it has less than 1,400,000 more of these sides than all others; but there would 
also be an improbability that it had above 1,600,600 times more. The chance for the latter is expressed 
by 1,600,000/1,600,001 raised to the millioneth power subtracted from unity, which is equal to 0-4647 ete 
and the chance for the former is equal to 1,400,000/1,400,001 raised to the same power, or to 0-4895; 
which, being both less than an equal chance, proves what I have said. But though it would be thus 
improbable that it had above 1,600,000 times more or less than 1,400,000 times more of these sides than of 
all others, it by no means follows that we have any reason for judging that the true proportion in this case 
lies somewhere between that of 1,600,000 to one and 1,400,000 to one. For he that will take the pains to 
make the calculation will find that there is nearly the probability expressed by 0-527, or but little more 
than an equal chance, that it lies somewhere between that of 600,000 to one and three millions to one. 
It may deserve to be added, that it is more probable that this proportion lies somewhere between that 
of 900,000 to 1 and 1,900,000 to 1 than between any other two proportions whose antecedents are to one 
another as 900,000 to 1,900,000, and consequents unity. 

I have made these observations chiefly because they are all strictly applicable to the events and 
appearances of nature. Antecedently to all experience, it would be improbable as infinite to one, that 
any particular event, beforehand imagined, should follow the application of any one natural object to 
another; because there would be an equal chance for any one of an infinity of other events. But if we had 
once seen any particular effects, as the burning of wood on putting it into fire, or the falling of a stone 
on detaching it from all contiguous objects, then the conclusions to be drawn from any number of sub- 
sequent events of the same kind would be to be determined in the same manner with the conclusions just 
mentioned relating to the constitution of the solid I have supposed. In other words. The first experiment 
supposed to be ever made on any natural object would only inform us of one event that may follow 
a particular change in the circumstances of those objects; but it would not suggest to us any ideas of 
uniformity in nature, or give us the least reason to apprehend that it was, in that instance or in any other, 
regular rather than irregular in its operations. But if the same event has followed without interruption in 
any one or more subsequent experiments, then some degree of uniformity will be observed; reason will be 
given to expect the same success in further experiments, and the calculations directed by the solution of 
this problem may be made. 

One example here it will not be amiss to give. 

Let us imagine to ourselves the case of a person just brought forth into this world, and left to collect 
from his observation of the order and course of events what powers and causes take place in it. The Sun 
would, probably, be the first object that would engage his attention; but after losing it the first night he 
would be entirely ignorant whether he should ever see it again. He would therefore be in the condition of 
a person making a first experiment about an event entirely unknown to him. But let him see a second 
appearance or one return of the Sun, and an expectation would be raised in him of a second return, and he 
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might know that there was an odds of 3 to 1 for some probability of this. This odds would increase, as 
before represented, with the number of returns to which he was witness. But no finite number of returns 
would be sufficient to produce absolute or physical certainty. For let it be supposed that he has seen it 
return at regular and stated intervals a million of times. The conclusions this would warrant would be 
such as follow. There would be the odds of the millioneth power of 2, to one, that it was likely that it would 
return again at the end of the usual interval. There would be the probability expressed by 0-5352, that 
the odds for this was not greater than 1,600,000 to 1; and the probability expressed by 0-5105, that it was 
not less than 1,400,000 to 1. 

It should be carefully remembered that these deductions suppose a previous total ignorance of nature. 
After having observed for some time the course of events it would be found that the operations of nature 
are in general regular, and that the powers and laws which prevail in it are stable and permanent. The 
consideration of this will cause one or a few experiments often to produce a much: stronger expectation of 
success in further experiments than would otherwise have been reasonable; just as the frequent observa- 
tion that things of a sort are disposed together in any place would lead us to conclude, upon discovering 
there any object of a particular sort, that there are laid up with it many others of the same sort. It is 
obvious that this, so far from contradicting the foregoing deductions, is only one particular case to which 
they are to be applied. 

What has been said seems sufficient to shew us what conclusions to draw from uniform experience. It 
demonstrates, particularly, that instead of proving that events will always happen agreeably to it, there 
will be always reason against this conclusion. In other words, where the course of nature has been the 
most constant, we can have only reason to reckon upon a recurrency of events proportioned to the degree 
of this constancy; but we can have no reason for thinking that there are no causes in nature which will 
ever interfere with the operations of the causes from which this constancy is derived, or no circumstances 
of the world in which it will fail. And if this is true, supposing our only data derived from experience, 
we shall find additional reason for thinking thus if we apply other principles, or have recourse to such 
considerations as reason, independently of experience, can suggest. 

But I have gone further than I intended here; and it is time to turn our thoughts to another branch of 
this subject: I mean, to cases where an experiment has sometimes succeeded and sometimes failed. 

Here, again, in order to be as plain and explicit as possible, it will be proper to put the following case, 
which is the easiest and simplest I can think of. 

Let us then imagine a person present at the drawing of a lottery, who knows nothing of its scheme or of 
the proportion of Blanks to Prizes in it. Let it further be supposed, that he is obliged to infer this from the 
number of blanks he hears drawn compared with the number of prizes; and that it is enquired what 
conclusions in these circumstances he may reasonably make. 

Let him first hear ten blanks drawn and one prize, and Jet it be enquired what chance he will have for 
being right if he guesses that the proportion of blanks to prizes in the lottery lies somewhere between the 
proportions of 9 to 1 and 11 by 1. 

Here taking X = +4, x = >, p = 10,q = 1,n = 11, E = 11, the required chance, according to the first 
rule, is (n+ 1) Z multiplied by the difference between 
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There would therefore be an odds of about 923 to 76, or nearly 12 to 1 against his being right. Had 
he guessed only in general that there were less than 9 blanks to a prize, there would have been a proba- 
bility of his being right equal to 0-6589, or the odds of 65 to 34. 

Again, suppose that he has heard 20 blanks drawn and 2 prizes; what chance will he have for being right 
if he makes the same guess? 

Here X and x being the same, we haven = 22, p = 20,q = 2, H = 231, and the required chance equal to 


p+1 p+2 — p+3 p+1 p+2 ae, p+3 
(n+ pa —-= 4Mg-Dx aE a} = 010843 ete. 
ptl pt+2 2(p +3) +1 p+3 2(p + 3) 








He will, therefore, have a better chance for being right than in the former instance, the odds against 
him now being 892 to 108 or about 9 to 1. But should he only guess in general, as before, that there were 
less than 9 blanks to a prize, his chance for being right will be worse; for instead of 0-6589 or an odds of 
near two to one, it will be 0-584, or an odds of 584 to 415. 

Suppose, further, that he has heard 40 blanks drawn and 4 prizes; what will the before-mentioned 
chances be? 
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The answer here is 0-1525, for the former of these chances; and 0-527, for the latter. There will, there- 
fore, now be an odds of only 5} to 1 against the proportion of blanks to prizes lying between 9 to 1 and 
11 to 1; and but little more than an equal chance that it is less than 9 to 1. 

Once more. Suppose he has heard 100 blanks drawn and 10 prizes. 

The answer here may still be found by the first rule; and the chance for a proportion of blanks to prizes 
less than 9 to 1 will be 0-44109, and for a proportion greater than 11 to 1, 0-3082. It would therefore 
be likely that there were not fewer than 9 or more than 11 blanks to a prize. But at the same time it will 
remain unlikely* that the true proportion should lie between 9 to 1 and 11 to 1, the chance for this being 
0-2506 etc. There will therefore be still an odds of near 3 to 1 against this. 

From these calculations it appears that, in the cireumstances I have supposed, the chance for being 
right in guessing the proportion of blanks to prizes to be nearly the same with that of the number of blanks 
drawn in a given time to the number of prizes drawn, is continually increasing as these numbers increase; 
and that therefore, when they are considerably large, this conclusion may be looked upon as morally 
certain. By parity of reason, it follows universally, with respect to every event about which a great 
number of experiments has been made, that the causes of its happening bear the same proportion to the 
causes of its failing, with the number of happenings to the number of failures; and that, if an event whose 
causes are supposed to be known, happens oftener or seldomer than is agreeable to this conclusion, there 
will be reason to believe that there are some unknown causes which disturb the operations of the known 
ones. With respect, therefore, particularly to the course of events in nature, it appears, that there is 
demonstrative evidence to prove that they are derived from permanent causes, or laws originally estab- 
lished in the constitution of nature in order to produce that order of events which we observe, and not 
from any of the powers of chance. This is just as evident as it would be, in the case I have insisted on, 
that the reason of drawing 10 times more blanks than prizes in millions of trials, was, that there were in 
the wheel about so many more blanks than prizes. 

But to proceed a little further in the demonstration of this point. 

We have seen that supposing a person, ignorant of the whole scheme of a lottery, should be led to 
conjecture, from hearing 100 blanks and 10 prizes drawn, that the proportion of blanks to prizes in the 
lottery was somewhere between 9 to 1 and 11 to 1, the chance for his being right would be 0-2506 etc. 
Let [us] now enquire what this chance would be in some higher cases. 

Let it be supposed that blanks have been drawn 1000 times, and prizes 100 times in 1100 trials. 

In this case the powers of X and z rise so high, and the number of terms in the two series 
Xt qx Pt? Prt qx? 

- etc. an ——— ete. 

pt+l pt+2 pt+l pt2 

become so numerous that it would require immense labour to obtain the answer by the first rule. ’Tis 
necessary, therefore, to have recourse to the second rule. But in order to make use of it, the interval 
between X and x must be a little altered. +9 — 5% is z+5, and therefore the interval between }$— z}p and 
+2 + zi» will be nearly the same with the interval between 3% and 44, only somewhat larger. If then we 
make the question to be; what chance there would be (supposing no more known than that blanks have 
been drawn 1000 times and prizes 100 times in 1100 trials) that the probability of drawing a blank in a 
single trial would lie somewhere between }¢ —z}5 and }$ + z+» we shall have a question of the same kind 
with the preceding questions, and deviate but little from the limits assigned in them. 

The answer, according to the second rule, is that this chance is greater than 
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* I suppose no attentive person will find any difficulty in this. It is only saying that, supposing the 
interval between nothing and certainty divided into a hundred equal chances, there will be 44 of them 
for a less proportion of blanks to prizes than 9 to 1, 31 for a greater than 11 to 1, and 25 for some propor- 
tion between 9 to 1 and 11 to 1; in which it is obvious that, though one of these suppositions must be true, 
yet, having each of them more chances against them than for them, they are all separately unlikely. 

t See Mr De Moivre’s Doctrine of Chances, page 250. 
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By making here 1000 = p, 100 = g, 1100 = n, x45 =z, 


“) Jn 
mz =z —]= 1-048808, Ha?b? = {h——_,, 
| Ge $ \(Kpq) 


h being the ratio whose hyperbolic logarithm is 
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and K the ratio of the quadrantal arc to radius; the former of these expressions will be found to be 
0-7953, and the latter 0-9405 etc. The chance enquired aiter, therefore, is greater than 0-7953, and less 
than 0-9405. That is; there will be an odds for being right in guessing that the proportion of blanks to 
prizes lies nearly between 9 to 1 and 11 to 1, (or exactly between 9 to 1 and 1111 to 99), which is greater than 
4 to 1, and less than 16 to 1. 
Suppose, again, that no more is known than that blanks have been drawn 10,000 times and prizes 1000 
times in 11,000 trials; what will the chance now mentioned be? 
Here the second as well as the first rule becomes useless, the value of mz being so great as to render 
it scarcely possible to calculate directly the series 


mz8 (n — 2) m®25 a 
au cy 2n.5 ; 


The third rule, therefore, must be used; and the information it gives us is, that the required 
chance is greater than 0-97421, or more than an odds of 40 to 1. 

By calculations similar to these may be determined universally, what expectations are warranted by 
any experiments, according to the different number of times in which they have succeeded and failed; or 
what should be thought of the probability that any particular cause in nature, with which we have any 
acquaintance, will or will not, in any single trial, produce an effect that has been conjoined with it. 

Most persons, probably, might expect that the chances in the specimen I have given would have been 
greater than I have found them. But this only shews how liable we are to error when we judge on this 
subject independently of calculation. One thing, however, should be remembered here; and that is, the 
narrowness of the interval between 3%; and 44, or between 42+ 4+ 9 and }?—z}». Had this interval been 
taken a little larger, there would have been a considerable difference in the results of the calculations. 
Thus had it been taken double, or z = 3g, it would have been found in the fourth instance that instead of 
odds against. there were odds for being right in judging that the probability of drawing a blank in a single 
trial lies between }? +345 and +2—5. 

The foregoing calculations further shew us the uses and defects of the rules laid down in the essay. 
‘Tis evident that the two last rules do not give us the required chances within such narrow limits as could 
be wished. But here again it should be considered, that these limits become narrower and narrower as q is 
taken larger in respect of p; and when p and q are equal, the exact solution is given in all cases by the 
second rule. These two rules therefore afford a direction to our judgment that may be of considerable use 
till some person shall discover a better approximation to the value of the two series in the first rule.* 

But what most of all recommends the solution in this Essay is, that it is compleat in those cases where 
information is most wanted, and where Mr De Moivre’s solution of the inverse problem can give little or 
no direction; I mean, in all cases where either p or g are of no considerable magnitude. In other cases, or 
when both p and q are very considerable, it is not difficult to perceive the truth of what has been here 
demonstrated, or that there is reason to believe in general that the chances for the happening of an event 
are to the chances for its failure in the same ratio with that of p to g. But we shall be greatly deceived if we 
judge in this manner when either p or gare small. And tho’ in such cases the Data are not sufficient to 
discover the exact probability of an event, yet it is very agreeable to be able to find the limits between 
which it is reasonable to think it must lie, and also to be able to determine the precise degree of assent 
which is due to any conclusions or assertions relating tc them. 


* Since this was written I have found out a method of considerably improving the approximation in 
the second and third rules by demonstrating that the expression 24/{1+2Ha?b‘+ 2Ha?b%/n} comes 
almost as near to the true value wanted as there is reason to desire, only always somewhat less. It seems 
necessary to hint this here; though the proof of it cannot be given. 
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THE PROPERTIES OF A STOCHASTIC MODEL 
FOR TWO COMPETING SPECIES 


By P. H. LESLIE ann J. C. GOWER 


Bureau of Animal Population, Department of Zoological Field Studies, Oxford 
and Rothamsted Experimental Station 


1. INTRODUCTION 


A stochastic model for studying the properties of certain biological systems by numerical 
methods has been described in an earlier paper (Leslie, 1958), to which reference should be 
made for the full details of its development. Two varieties of this model for the case of two 
competing species have been programmed for the Elliot-N.R.D.C. 401 computer in the 
Statistical Department of Rothamsted Experimental Station, and some of the results 
obtained are given in the following paper. 


2. THE MODELS USED 
Suppose that two species S, and S, are competing together in a limited environment, and 
that their populations consist of N,(t) and N,(t) individuals at time ¢. Then it is assumed that 
the expected balance of births and deaths in the two populations during the discrete interval 
of time ¢ to t+ 1 is defined by the deterministic model, 


N,(¢+ 1) = —— M(t) = Ax(t) M(t), 


Ay 
q(t) 
As Nyt) = A(t) Ny(t) 

alt 2(0), (1-1) 


N,(t+1) = 
where the functions 
Q(t) = 1+a,N,(t)+A,N, 
qo(t) = 1+a_N,(t) +f, M(t 
and the constants log, A, = r,, log, A, = r, are the intrinsic rates of increase of the two species. 
This system will have a stationary state when q, = A, and q, = A,, or when 
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If we define the ratios By(Az— 1) = ¥, ibs =2, 
(A, — 1) XX 
then, in the deterministic model, the following four possibilities arise: 
“u<y<l. Stable stationary state (both S, and S, persist). 
l<y<z. Unstable stationary state (either S, or S, persists, depending on the initial 


state of the system). 
y<l1,z>y. S, persists and S, disappears from the system. 
y>1,x<y. S, persists and S, disappears from the system. 
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By a suitable choice of the parameters in (1-1) we thus can construct a numerical system 
which will fall into one or other of these four categories. 

An innumerable variety of different stochastic models can be imagined which will have 
the same deterministic equivalent, depending on the assumptions made as to the birth-rate 
and death-rate functions for the two populations (cf. Bartlett, 1957). But these possibilities 
can be regarded as falling between the extreme cases of either the birth-rate or the death- 
rate of each species remaining constant. In order to study the qualitative properties of this 
type of system, we may work in terms of these two limits, and we shall consider, therefore, 
the following two models. 


Model I, in which it is assumed that the birth-rate of each species remains constant 
We have in (1:1) the constant 
A, =ee=eu4ta (a = 1,2), (1-3) 
where the intrinsic rate of increase r, is the difference between a birth-rate b, and a death- 
rate d,(b, >d,). Since the birth-rate of each species is assumed to remain constant, we have 
for the discrete interval of time ¢ to ¢+1, 


log, [Aa/a(t)] = log, A,(t) = b,—d,(t) (a =1, 2), 
where the death-rate d,(t) is a function of N,(t) and N,(¢), and is regarded as remaining 
constant during the interval. Then, from the standard theory of simple ‘birth’ and ‘death’ 


processes, it may be shown that if we adopt for our two hypothetical species values of A, 
b and d in (1-3) such as 


2-0 1-0083 0-3151 a 
2-5 10308 at (1-4) 
then in the stochastic model, we may take 
E[N,(t + 1)] = Ag(t) Na (t) 
= 1 2 1: 
var [N,(¢+ 1)] = 2E[N, (t+ = w= ,5), (1-5) 


where A,(¢) is defined in (1-1). In order to simplify matters in practice, we assume as an 
approximation that N,(t+ 1) is distributed normally with 1, and o2 given by (1-5), attri- 
buting all negative values to N,(¢+ 1) = 0. Thus, given N,(¢) and N,(¢) at time ¢t, N,(¢+ 1) and 
N,(t + 1) can be calculated with the help of a pair of random normal deviates, and the pro- 
cesses can then be continued with the resulting values. 


Model II, in which it is assumed that the death-rate of each species remains constant 
In this case we now have in (1-1) for the interval of time ¢ to ¢+ 1, 
log. [Aa/qa(t)] = log. Ag(t) = balt)—da (a = 1,2), 
where b, (t) for the particular species is a function of N,(¢) and N,(¢), and the death-rate d, 


remains constant. Because no meaning can be attached to a negative birth-rate, we tiiere- 
fore have to define 


Na(t) = Aaldalt) (1<qalt)<e), 
since b,(¢) = 0, when q,(t) = e%«; and 


Aa(t) = e~4a (Ya(t) 2 ea), 
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Then, provided we adopt the same values of A, b and d as given in (1-4), we have 
ELN,(¢ + 1)] = Ag(t) Nault) (Aq > Aalt) 2 e-), 
=e%N,(t) (qal(t) > ea), 
(a = 1,2) (1-6) 
and var [N,(t+1)] = f[Ag(t)] EINg(t+1)] (Ag>Ag(t) > e-*), 
= (1—e~) E[N,(¢+1)]  (qa(t)>e"), 





where the functions f[A,(/)] in the expression for the variance, for the two values of A, are 
given by the linear relationships, 


A=2:5: f[A(t)] = —0-87+1-10A(), 
A=2-0: f[A(t)] = —0-66 + 1-29A(¢). 


As before, we assume that given N,(t) and N,(t) at time ¢, then N,(¢+1) and N,(t+1) are 
distributed normally with these means and variances. 


3. PROGRAMMING OF THE MODELS 


The programme was so arranged that the constants a, a, £;, fz, A;, Az were read into the 
computer at the beginning of each run, together with a pair of random numbers and the 
initial population size for each species. It was thus possible to vary these parameters very 
easily and so study the models under different conditions. The random numbers were re- 
quired to start a process for producing pseudo-random numbers and eventually random 
normal deviates; the possibility of using existing tables of random numbers was excluded 
since about 40,000 numbers might be required for each initial point estimated. The method 
adopted was as follows: 
(i) Choose p and x’ at random, e.g. from a table of random numbers. 

(ii) Replace x’ by x, the closest number to x’ such that «=5 (mod 8). 

(iii) Form successively the numbers px” mod (2), n = 1, 2, 3,.... 

Under these conditions it can be shown (the proof resting on a theorem of Euler) that the 
numbers so formed form a repeating cycle of 2*-? different numbers. In fact the successive 
powers of x are all the 2*-? numbers (mod 2*) whose last two binary digits are 01, where for 
the Elliot 401 computer & is taken to be 32. By choosing different values of p the sequence 
may be generated in different orders. The main advantage of this over other methods 
advocated for generating pseudo-random numbers is that it is impossible to get into a closed 
loop generating zero or the same few numbers over and over again. Tests for various types 
of departure from randomness for this method have been reported in the literature (Foster, 
1954; Taussky & Todd, 1956), and it has generally been found to be satisfactory. Con- 
sequently, only a very simple test, which may be regarded as a form of quality control, was 
incorporated in the programme. 

Random normal deviates were produced by summing twelve variates (uniform in the 
range —}4<x<}) produced by the above method. Such a deviate will have zero mean and 
unit variance. A running total was kept of all deviates of magnitude greater than two, 
together with the total number of deviates used. These deviates were not rejected since 
otherwise an undesirable bias would have been introduced into the process. 

For each pair of initial population values in the case of a system with an unstable stationary 
state, the programme was arranged to run through sixty representations of the population 
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growth, stopping when one or other of the species had become zero. For each representation 
the number of units of time required to reach extinction and the particular species which had 
vanished were recorded in the machine. After all sixty trials the probability that species S, 
survived and its standard deviation were printed, together with the distribution of the time 
6) to extinction for each species and the observed and expected number of normal deviates 
used of magnitude greater than two. (These ‘normal’ deviates are in fact the sum of twelve 
uniform variates, as explained above, so that the appropriate percentage point is 0-04455 as 
opposed to 0-04550 for the normal distribution (Hall, 1927).) 

ure The programmes for the two models differed only in the evaluation of the expression for 
var [N(¢t+1)], so that the modifications required were almost trivial. 
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4. THE PROPERTIES OF A SYSTEM WITH AN UNSTABLE STATIONARY STATE 


When the deterministic model has an unstable stationary state, the fiial outcome of the 


racy interaction depends on the initial state of the system. Consequently, in the stochastic model, 
it might be expected that random variations, more particularly in the early stages of 
population growth, could be an important factor in deciding which of the two species would 
survive (Bartlett, 1957). 

the In order to investigate this point, the numerical values of the parameters in (1-1) were 

the taken as 

ery A, = 2:5: gq, = 140-0030 N, + 0-0105 N,, 

a Ay = 2:0: gy = 1+0-0025N, + 0-0050N,, 

led which, from (1-2), give an unstable stationary state with L, = 150 and L, = 100. 

10d Suppose, for example, that in the deterministic model we take N,(0) = N,(0) = 20, then 
by a repeated application of equations (1-1) it appears that given these initial conditions, the 
species S, will always survive and S, will disappear from the system. (Given other initial 
conditions, of course, this would not necessarily be the case.) But in both types of stochastic 
model, I and IT, it was found that with these same initial conditions, the processes sometimes 

the went in one direction and sometimes in the other. A few typical (N,, N,) trajectories for 

ive model I are given in Fig. 1. It was clear from these preliminary results that at any point on 

for the (N,, .N,) plane there would be a probability, 0<p<1, that ultimately the species S, 

nee would survive and S, disappear from the system. It was of interest, therefore, to determine 

ods the contour lines of p for the two extreme types of stochastic model. 

sed We give in Table 1 the estimated values of p, in the case of model I, for a number of points 

pes on the plane, each of these estimates being based on 60 replicates. These points can be re- 

ter, garded either as the initial conditions at t = 0 of some particular system, or as the conditions 

on- at time ¢ of a system which started at some previous time t—a. By a suitable change in the 

vas origin of the time scale for the latter, the two cases become equivalent. The general pattern 

) of the distribution of p can be seen from the entries in this table. For a fixed value of N,(0), 

the p steadily increases with increasing N,(0) until it becomes 100%. Thus, if the trajectory of 

und a particular replicate were to reach the region where p ~ 1, this means that the species S, is 

wo, almost certain to survive, and that the chance of any reversal in the trend is negligibly small. 

nce Conversely, a trajectory reaching the region where p ~ 0 means that the species S, is almost 
certain to survive. . 

ary In order to map the contour lines of p, we made use of the empirical observation that for 


each value of N,(0) in Table 1, the probit of p was linearly related to log N,(0). These probit 
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lines were calculated in the usual way, and in each case the y? goodness of fit test was satis- 
factory. The only feature of these lines which should be mentioned is that the slopes steadily 
increased with increasing N,(0); in other words, a probit plane could not be fitted to the 
results given in Table 1. The contour lines for p = 95, 50 and 5 % which were calculated from 
these regressions of probit p on log N,(0) are given in Fig. 2, the irregularities in the figure 
being due to the sampling errors involved in the estimates. The spread of the contour lines is 
fan-shaped, and in this numerical system the 50% line passes through the unstable equili- 
brium point, Z, = 150, LZ, = 100. (The estimated 50 % point for N,(0) = 100 was N,(0) = 155, 
with a fiducial range (P = 0-95) of 149-162.) However, there is no reason to think that this 
would necessarily be true for all systems with an unstable stationary state. 


400 


300 F 


N2 





| | ! 
100 200 300 400 500 
Ny 
Fig. 1. Selected complete trajectories of the population size model I (the initial population in each 
case is N,= N,=20). 





The results of a similar series of calculations using model II are given in Table 2. For 
relatively small values of the initial numbers the probability p appears to be much the same 
in the two models (cf. the estimates for N,(0) = 20 and variable N,(0) in Tables 1 and 2); but 
in model IT, as N,(0) increased in magnitude, the slopes of the probit lines became steadily 
greater than those for model I, leading to a narrower band of probabilities lying between 
p~0 and p~ 1. This is shown in the comparable graph of the 95, 50 and 5% contour lines 
of p, given in Fig. 3. This smaller spread is presumably due to the smaller variance of this 
model.* 


* That the smaller spread of the contour lines in Fig. 3, compared with that in Fig. 2, is due to the 
smaller variance of model II, was confivmed accidentally by a set of calculations using model I in 
which, through an error in programming, it was assumed that var [N,(¢+ 1)] for each species was equal 
to 0-5 E [N,(t+1)], instead of the correct approximation var [N,(t+ 1)]=2E [N,(¢+1)]. Exactly the 
same pattern of contour lines emerged as in Fig. 2, but the fan-shaped spread was very much less. 
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From the fitting of the probit regression lines for each fixed value of N,(0) in Tables 1 and 2, 
the estimated 50% points for N,(0) were the same in the two models, apart from errors of 
random sampling, up to N,(0) = 125; but for N,(0) = 150 and 175 they were significantly 
less in model IT than in model I, leading to the curvature of the contour lines which can be 
seen in Fig. 3. The explanation of this difference between the two models appears to be that 
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Table 1. Estimates of the probability p (%) that the species S, will survive, 
for varying N,(0) and N,(0), in the case of model I 









































N,(0) 
(0) 
20 50 | 75 100 «=| 125 150 175 

10 5-00 is | whan eae fe | a as 

15 21-67 me se -— ee |g ms ee 

20 51-67 a oe a oe ae a 

25 66-67 — | — — — — 

30 76-67 _ | — — — | — — 

40 93-33 3-33 4 = | = — 

50 100-00 20-00 0-00 = — dob 

60 ia 46-67 os tie | ‘ti sate 

70 —_ 58-33 — sigs a. — 4 stats até 

15 = 68-33 15-00 0-00 0-00 | = 4 

85 a 78-33 te = 0-00 “| C 
100 iano 95-00 35-00 10-00 ae ale 
125 a ie 61-67 25-00 3-33 a 0-00 
150 — — 85-00 38-33 18-33 1-67 
175 st 100-00 95-00 56-67 43-33. | 16-67 a= 
200 nee re = 86-67 56-67 33-33 5-00 
225 vane “ = 93-33 83-33 50-00 15-00 
250 i sn | 100-00 mn 93-33 | 66-67 31-67 
275 — | - — — 95-00 78-33 58-33 
300 _- — — 100-00 aso | 93-33 70-00 
325 ve a i Pa ane — | oer 80-00 
350 _ | west | = oe | 96-67 | a 90-00 | 





Each estimate of p is based on 60 replicates. 


in this particular numerical system the processes may be involved with the discontinuities in 
model II, which are due to the restriction that the birth-rate of each species becomes zero 
when q,, = e’«(a = 1,2). When q, < e’«the expected numbers E[N,(¢ + 1)] are the same in both 
models for given N,(t) and N,(¢); but if this is not the case, then the expectations are different. 
To take as an example a typical point in this region of the plane, suppose that N,(0) = 225 
and N,(0) = 175. Then, from (1-1) and (1-4) we have 


Q, = 3°5125 > es = 2-8033, 
Yo = 2°5625 < es = 2-7409. 


Hence from (1-5) and (1-6) the expected numbers at t = 1 are H[.N,(1)] = 201, HLN,(1)] = 137 
in the case of model IT, and E[.N,(1)] = 160, ELN,(1)] = 137 in model I. Thus, the trajectories 
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Table 2. Estimates of the probability p (°/,) that the species S, will survive, 


for varying N,(0) and N,(0), in the case of model II 









































N,(0) 
| ,(0) 
| 
| 20 | 50 | 75 100 125 150 175 
| | | 
| 10 6-67 | _— | _ — — — — 
15 18:33 | ~ | — — — — — 
20 58:33 — | — — + -- ~- 
| 25 70-00 one _ _ — — — 
30 80:00 | 0-00 — — — — — 
| 40 95-00 | 1-67 | — _ — a a 
| 60 100-00 167 | — | — -- — — 
60 — 31-67 | 0:00 | — os — — 
| 70 — 58:33 | _ -- — -- — 
| 75 — 68-33 | 5-00 — — -- — 
| 100 ~ 100-00 | 30-00 0-00 _ — — 
110 — — | 68-33 — | — — — 
125 - | 1000 | O00 | — _ 
150 — — | 10000 | 55-00 5:00 | 0-00 — 
175 - —- | — | 91-67* | 16-67 | 3-33 0-00 
190 _ — — ; o— | 55:00 | — _ 
200 ~ re | 99-17* | 70:00 | 25-00 8-89+ 
220 — = | — | = | ae | facos 41-67 
225 —-— foe | 100-00 | 100-00 | 83-33 58-33 
| 250 — | _ | — 100-00 | — | 98-33 98-33 
275 _ | — 100-00 | — | 100-00 100-00 
300 — _ | — — — | 100-00 a 
| | | 
* Estimate based on 120 replicates. 
+ Estimate based on 180 replicates. 
All other estimates based on 60 replicates. 
200 }- ‘. 
% % 5 % 
180}- - — + 
om a ee uae. 
140 il 
+ it + 
120} Po 
N2 
100 - F Pm" oi 
80/- — + 
60 Fai 
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Fig. 2. Contour lines for percentage probability that S, survives and S, disappears, model I. 
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for the two models start off by following, on the average, different paths, and the divergence 
between them becomes greater with increasing time. From the direction of the difference 
between the mean paths for givea N, (e.g. E[N,(1)] for II > ELN,(1)] for I | ZLN,(1)] = 137), 
we should expect, therefore, that the probability p associated with these initial conditions 
would be greater in the case of model II than in model I. Since the majority of the points 
tested for fixed N,(0) = 150 and 175 fell in the same region where q, > e”:, q. < e”s, there would 
tend to be a greater probability of the species S, surviving in the constant death-rate type of 
model for such initial conditions of this numerical system. 
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Fig. 3. Contour lines for percentage probability that S, survives and S, disappears, model II. 


The difference between the two models for these rather extreme initial states of the system 
in relation to the unstable stationary point is possibly of less interest, however, than the 
agreement between them for the remaining points tested. It can be seen from Figs. 2 and 3 
that, broadly speaking, the general pattern of the contour lines of p in the remaining regions 
of the plane is very similar for the two models, apart from the degree of spread, and we can 
infer that the results for all the other possible types of stochastic model which have this same 
deterministic equivalent would fall somewhere in between the results for these two limiting 
cases. 


5. RESULTS FOR A SYSTEM WITH A STABLE STATIONARY STATE 


As a contrast to this type of system, suppose we take the case of a stable stationary state 
with L, = 150 and L, = 100, for example if in (1-1) we have 


A, = 2:5: 4g, = 14+ 0-00800N, + 0-00300N;, 
Ap = 2-0: qo = 1+ 0-00625N, + 0-00250N,. 


Two realizations were calculated for this system using model I, because of the saving of 
machine time in the case of this model, and also because of its greater variance. Taking the 
initial state of the system as N,(0) = N,(0) = 20, the values of N,(t) and N,(¢) were printed off 
at each step in the calculations, the first realization being computed up to ¢t = 149, and the 
second up to t = 70. 

These processes rapidly approached their equilibrium levels, in the region of which they 
then continued to fluctuate in an irregular fashion. As an illustration, the results for the 
second realization are given in Fig. 4, neglecting the first six steps in the calculation during 
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Table 3. Frequency distribution of the observed N, and N, when the processes 
were in the region of the stable stationary state 


Fig. 4. Fluctuations around the stable stationary state in the case of the second replicate. 


—-—--, species S, (equilibrium level N,= 150); , species S, (equilibrium level 




























mone | 








Nz 
N, ; - Total 
| | | | 
| 30- | 40- | 50- | 60- | 70- | 80—- | 90— | 100- | 110- aa 130- | 140- 
| 1 
| | | | 
ew a _— a | — 2\;— _— —|— — 2 
met — Pe ee ee Pe ee 3 yom ar. 
}ll0-] — | —}| —]}] —]—] 1 1 2 4 2; 2|}/— | 
}1z0-| — | —| — »j}—j] 1 To 4 4}; 2]— | 2% 
lase-|] — | — l l 1 8 ,) 3% 5 3 | 2 3 | 29 
140-| — — 1 1 2 6 8 | 11 6 ei. 4 — 39 
|150-| 2 1 4 4}; — 4 4/ 5 3 el tis 1S 
| 160- — l 4 3 6 et 4 5 4/3a];— 31 
imei — | — 3 2 2\|— 6 3 Cb we feed 18 
|} 180-| — — 2 — eee A — — 1 — —i— | 4 
190- | — 2 rj rat] —|{— 2;/—/;—|]—|-— | 
| 200- | - _ 1 1 


























otal 











325 


which time an approach was being made to the steady state, L, = 150, L, = 100. It will be 
seen that although these processes fluctuate around the latter, there is at times a tendency 
in both cases for a drift to occur away from this region. Thus, in Fig. 4, the numbers of the 
second species, after fluctuating somewhat above the equilibrium level from ¢ = 20 until 
about ¢t = 35, then started a slow drift towards the base-line, but later recovered so that at 
the time the calculations were stopped, the numbers of this species had again returned to the 
region of the steady state. During this time the numbers of the first species tended to drift in 
the opposite direction. The same phenomenon, in a varying degree, was also apparent in 
the results for the first realization, and most probably it is associated with the negative 
correlation which exists between the numbers of the two species. 

We give in Table 3 the observed bivariate distribution of N, and N, for the combined data 
for the two realizations in the region of the stationary state. From this table we have 
N, = 148-3 and N, = 97-2; while var(N,) = 553-8, var(N,) = 570-3 and cov(N,,N,) = — 243-4. 
The type of distribution seems to be approximately normal in form. Thus, if we calculate 
the expected marginal distributions from the given estimates of the means and variances, 
we have: 
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N, Expected Observed Nz Expected Observed 
<90 1-4 0 <30 0-5 0 
90- 2-8$10-8 2$11 30- 1-2 2 
100- 23 | 40- 3-2 iss 3 sa 
110 13-0 12 50- 7-4 13 
120- 21-3 20 60- 13-9 15 
130- 29-8 29 70- 22-6 10 
140- 34-6 39 80- 30-2 28 
150- 33-3 | iE 90- 34-1 33 
160- 27-3 | 31 100- 32-5 33 
170- 18-8 | . ae 110- 26-1 33 
180- 10-2 4 120- 17-6 23 
190- 5-0 6 130- 10-1 10 
200- 2-0 7-9 3| 12 140- 4:8 | 
210- 0-7 1 150- 1:9}7-6 0}4 
> 220 0 31 al > 160 0-9 0 
Total 207-0 207 207-0 207 


























For the marginal distribution of N,, we have yx? = 7:3 for 7d.f., a perfectly reasonable 
value to obtain; while for the distribution of N,, y? = 15-1 (7d.f.), a somewhat excessive 
value which is due very largely to the deficiency of the observed values in the N, = 70—79 
class. 

It is of interest to compare the variances of these observed fluctuations in the region of the 
stationary state with those expected on the theory of small fluctuations for the discrete time 
model used here.* This model may be written as 


N,(t+1) = fa(Ny,No,t)+Zq(t+1) (a = 1,2), 


* We are indebted to Prof. M. S. Bartlett for pointing out to us the following method of deter- 
mining the theoretical variances and covariance for the discrete time model. 
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where the first term is the deterministic part of the process given by (1-1), and the 
Z,(¢+1), (a = 1,2), are independent normal variables with zero means and variances, 
in the case when it is assumed that the birth-rate of each species remains constant, 
o*(Z,) ~ 2E[N,(¢+1)]. In the region of the stationary state 


ON, (t+ 1) = N(t+1)—L,, 
and 4 = EX(fq—L,)} + 0(Zq), 
cov (N,,.N,) = cov = E{(f, — Ly) (f2—L)}- 


Hence we have the set of equations for determining the variances and covariance of N, and 
N, when the fluctuations are regarded as small, 


a3 = (38) op+2 (st) (St cov v(t) o3+0°(Z,), 
o8= (ZR) of+ 2(3) (2) cov v+(s8) 0 2+4+0°7(Z,), 


(hh (22) Ofr\ (fe). (Ht h)) (23) (3) 
wen (33) an,) 71+ (63 (3 - (33) (3 cov + (an-) (an) 7® 
which are to be evaluated for N, = L,. Thus, to take the first species as an example, we have 


from 
A« AM 
, 1+a,N, +, Ny,’ 


(52) = 1-2 (2) _ Ail; 
OM, Iy,L, Ay ; ON. LL, Ay i 


and similar expressions in terms of A,, x, 8, and L, from the function f, for the second species. 
Given the numerical values of the parameters in the present example, the equations are 





0-72960? + 0-1872 cov — 0-032403 = 300, 
— 0-0156250? + 0-171875 cov + 0-5273437503 = 200, 
0-06507 + 0-62 cov + 0-1237503 = 0, 


whence of = 465-5, o% = 437-4 and cov = —136-1. These expected variances and co- 
variance are less than those actually observed, viz. var'(N,) = 553-8, var (N,) = 570-3 and 
cov (N,, N,) = — 243-4; but, apart from the question of the sampling errors associated with 
the observed values, the agreement seems to be reasonable when we consider that the 
expected values are based on the theory of small fluctuations, and so cannot be exactly 
correct. 

It is perhaps worth noting that if we were to derive the theoretical variances from the type 
of continuous time model suggested by Bartlett (1955, 1957; cf. also Whittle, 1957) for 
studying the properties of such stochastic processes in the region of the stationary state, the 
discrepancy between expected and observed would become much greater. Thus, for a small 
interval of time dt, the equivalent continuous time model for two competing species may be 
written, 

= (r,N,—a,N?—k,a, N, N,)dt + bY, — 8Z,, ) 


5-1 
“i (rN, — @_N3 —kya,N, N;) dt + dY¥,—8Z.,) vi 





va 
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where OY;, 6Z;(t = 1, 2) are independent, modified, Poisson variables with zero means, and 
variances, in the case when it is assumed that the birth-rate of each species remains constant, 
var (dY;) = 6; N, dt, (¢ = 1, 2) 
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In the region of the stationary state suppose that N, = L,(1+4,) (i = 1, 2), then for small 
wu we have from (5-1) the linear stochastic system, 


Buy = —ay(Ly Uy + ky Lata) teed, (5-2) 


du, = —a,(Lau,+k,L,u,) t+ d¢, 


where, for the constant birth-rate type of model, df and d6¢ have variances (2b,/L,) dt and 
(2b,/L,) dt, respectively. Forming the equations wu, +du, and u.+ du, from (5-2), we obtain 
by squaring, taking the cross-product and averaging the following expressions for the 
variances and covariance (o?, 72 and o,) of u, and w,. If we write b,/(a,L,) = X, and 
b,/(a.L,) = Xo, then 

I? 02 = X,—k, L,04,, 

[?o3 = X,—k,L, 04, | 
while the covariance is given by | 


(ky kg — 1) (a, L, + ayLg) 042 = Agh,X,+a,k, Xp. 


(5-3) 


The values of L, and L, are the same in both types of model, while the relationship between 
the remaining parameters, in terms of those for the discrete time model, is given by 
a; = a, log, A;/(A;—1) and k; = £;/a,(¢ = 1,2). Since in this particular numerical system 
b,~b,~1, we have from (5-3), expressing these variances and covariance in the form 
var (N;) = L? var (u;), 7? = 242-1, 02 = 270-7 and cov = — 99-8. Thus it is evident that the 
theoretical variances of small fluctuations about the steady state are smaller for the con- 
tinuous time model than for the discrete case.* 

It appears, then, that in a system of two competing species with a stable stationary state, 
the number of individuals over a relatively long period of time settles down to a type of 
distribution which is approximately normal in form, but with a degree of variation which 
may be greater than that expected for small deviations about the stable state. This greater 
degree of variation about the equilibrium level can only lead to an increased chance of 
random extinction of one or other of the two species. 

No calculations were carried out for this system using model IT, but it is quite clear that 
because of the smaller variance of this model, there would have been a smaller degree of 
variation about the steady state (cf. the results for a logistic process using these two types 


* This point also arises in regard to some comparisons which have been made (Leslie, 1958) between 
the theoretical and observed variances of a number of logistic processes fluctuating in the region of 
the upper asymptote. A number of replicates were observed over a re!atively short period of time, 
and the estimated variances were based essentially on the pooled ‘Within replicates’ mean squares. 
These variances were somewhat greater than those expected from the continuous time model. It 
appears now that a better agreement would have been obtained if the observed variances had been based 
on the ‘Total’ sums of squares and the theoretical variances had been calculated for the discrete time 
model. For a logistic process these theoretical variances are given by o?=2bKA?/(A?— 1), when the 
birth-rate remains constant, and by 0? = 2dKA?/(A?— 1) when the death-rate remains constant, where 
in each case K is the upper asymptote in numbers. However, any revision of the estimates given 
(Leslie, 1958, § 8) would not affect the principal conclusions, namely that in the particular processes 
studied, the chance of extinction was negligible for any given time interval. 
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of stochastic model (Leslie, 1958, §8)). Other things being equal, therefore, the chances of 
random extinction would be less for the second type of model, in which it is assumed that the 
death-rate of each species remains constant. 


6. WHEN ONE SPECIES ALWAYS PERSISTS 
The question remains as to the effect of random fluctuations in the other possibilities which 
arise in the deterministic model. Thus, if in (1-2) we put 


BiAs-1)_ Bibs _ 


Ay(Ay — 1) Ris XA, 





then when y<1,x>y, the species S, will always persist and S, will disappear from the 
system. When y>1, x>y, we have the case of the unstable stationary state, and in the 
deterministic model there is a sharp demarcation between these possibilities when y = 1. In 
the stochastic model, however, these demarcations must be interpreted more liberally. For 
instance, if we take the parameters of the system to be 


A, = 25: gq, = 1+0-00300N, + 0-00375N,, 
Az = 2-0: gz = 14+0-00250N, + 0-00500N,, 


then y = 1, x = 2-5, and there is an unstable stationary state with L, = 0, L, = 400. In the 
case of the deterministic model, the species S, would always persist, whatever the initial 
conditions of the system; but, in the stochastic model, although the probability of the 
species S, persisting will be p ~ 1 over most of the (N,, N,) plane, there is still a region where 
there is a non-zero probability g = 1 —p that the outcome of the interaction will be reversed. 
For instance, the following were the estimates of p(%), based in each case on 60 replicates, 
for the stated values of N,(0) and N,(0) in a system with the above parameters, using model I. 


Values of p (%) 








N,(0) 150 200 350 400 
N,(0) _ 
SaaS. | Nee? Sey 
25 98-33 95-00 81-67 60-00 
50 100-00 98-33 93-33 90-00 
100 100-00 100-00 100-00 100-00 























These results for the borderline case suggest that by making progressive changes in the 
assumed values of the parameters, it would not be difficult to arrive at a system with y <1, 
x>y, in which the properties of the deterministic model would be changed in no way by 
random variations. Similarly, for the case of y> 1, «<y, which in the deterministic model 
means that the species S, will always persist. 


7. CONCLUSIONS 
We may conclude from the results of these experiments that the most important difference 
between the properties of this stochastic model and its deterministic equivalent is in the case 
of a system with an unstable stationary state. In the deterministic model this implies that 
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3 of the outcome will depend on the initial state of the system, and that for any given state the 
the particular outcome is then certain to occur. In the stochastic model, however, there is 
associated with the state of this type of system at any time f, a probability p that ultimately 
one of the species will survive and the other become extinct, and a probability gq = 1— p that 

the outcome of the interaction will be reversed. 
ich This feature of the stochastic model is qualitatively very similar to the phenomenon 
observed by Park (1954) in competing populations of the flour beetles, T'ribolium castaneum 
and 7’. confusum. The initial conditions adopted in his original experiments (2¢, 29 adults of 
each species) were the same for all replicates, and batches of replicates were observed under 
six different combinations of temperature and relative humidity. In four of his physical 
the treatments, 7’. castaneum survived in a certain proportion p of the replicates, and 7’. con- 
the fusum survived in the remaining q = 1—», the value of p varying according to treatment 
te. (Park, 1954, table 12, treatments II-V). By plotting the trajectories of the individual 
For replicates for these four treatments, Neyman, Park & Scott (1956) were able, in each case, 
to divide the (N,,.N,) plane empirically into three zones. The two outer of these were 
‘determinate’, since it appeared that if the trajectory of a replicate reached one or other of 
these zones, only one consequence was then possible, while in between there was an 
‘indeterminate’ zone in which the process might still go either in one direction or the other, 
the though not with an equal probability. Their figures for these zones (Neyman ef al. p. 58) 
bial show a very similar type of fan-shaped pattern to the graphs of the contour lines of p, which 
the we have given here in Figs. 2 and 3. A further noteworthy point is the relative consistency 
ares of the observed values of p in different experiments with these species, when the populations 
ol were initiated with the same number of individuals and kept under similar physical condi- 
th tions (Park & Lloyd, 1955, table 1). In a more recent paper, Park (1957) has examined the 
nl L relation of the initial numbers to the competitive outcome in populations of these species 
kept at 34°C., 70% R.H., conditions under which 7’. castanewm always won and 7’. confusum 
was eliminated in his original series of experiments. The results were that out of five different 
combinations of initial numbers (in terms of eggs) at t = 0, 7’. castanewm still won in every 
replicate for four of the combinations; but in the remaining one, the outcome was reversed 
in some cases, 7’. confusum winning in five out of the fourteen replicates observed. Thus, the 
same phenomenon was realized experimentally when the initial conditions of the system 

were varied. 

Clearly, there is a close analogy between the results observed in these experimental 
populations and the qualitative properties of this model for two competing species, when the 
stationary state is unstable. But this analogy cannot be taken as direct evidence that the 
phenomena observed by Park were due to the existence of this type of stationary state in his 

the competitive systems: it can only be regarded as suggestive. In order to decide whether or 
<I, not this was the case, some quantitative comparisons between the theoretical model and the 
by observed data would be necessary, and for this the type of model used here is of much too 
del simple a form. In its development we have neglected the effect of a changing age distribution 
on population growth, and factors such as the mutual cannibalism of eggs and pupae by : 
certain age groups, which are known to occur in the case of T'riboliwm and which necessarily 
must have an important effect on the growth in numbers and the interaction between these 
nce two species. Nevertheless, the results for this simple stochastic model indicate some of the 
ase possibilities which may also arise in the case of the more complex models for this type of 
interaction. 
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A PROBLEM IN THE COMBINATION OF ACCIDENT FREQUENCIES 


By J. C. TANNER 
Road Research Laboratory, Department of Scientific and Industrial Research 


The paper is concerned with the analysis of road accident frequencies before and after similar changes 
in road conditions are made at each of a number of sites. At any one site it is assumed that the total 
number of accidents recorded is binomially distributed between the before period and the after period, 
the parameter of the distribution depending on the relative lengths of the periods as well as on the 
effect of the change. A method of estimating the average effect of the changes is proposed. It is 
shown how the accuracy of this estimate depends not only on the chance variations arising from the 
smallness of the accident frequencies but also on any real differences that may exist between the 
effects of the changes at different places. The methods proposed are illustrated by a numerical example. 


1. THE PROBLEM 


To estimate the effect of a change in road conditions at a particular site on the frequency of 
accidents there, the usual procedure is to obtain details of accidents at the site in convenient 
periods before and after and to compare the ratio after to before with the corresponding ratio 
for a large control area. The latter may be the whole of the police district in which the site lies, 
or some other area from which trends due to external factors can be reliably assessed. The 
significance of the difference between the two ratios can be tested in the usual way by means 
of x? with 1 degree of freedom (see §3 below). 

Frequently, however, one wishes to combine the data from a representative sample of 
changes of a given type, since the frequencies for any single change are usually too small to 
enable useful conclusions to be drawn. This raises three problems. In cise first place, unless 
all the before periods, and also all the after periods, are of the same length (or more generally, 
if the ‘control ratios’ after to before are the same at all sites), it is not immediately obvious 
how the average effect of the type of change concerned should be estimated. Secondly, it is 
desirable to test whether the effect of a given type of change is the same at all sites. Thirdly, 
if there is reason to suppose that it varies then complications arise in testing the significance 
of the average change. 

Expanding on the third point, one of two null hypotheses can be tested: either 
that over the set of changes actually studied there was no effect on the total expected 
frequency of accidents, or that over the population of changes of which those actually 
studied form a sample, there was, on the average, no change in the frequency of accidents. 
The latter test is on the whole more useful, and is the only one dealt with in detail in this 
paper. 

This paper discusses these matters and suggests appropriate methods of estimation and 
significance tests. These methods have already been applied extensively to practical prob- 
lems, a general survey of which has been given by Garwood & Tanner (1956). 


2. NoTaTIoNn 
N Number of sites from which data are to be combined. 
b; Number of accidents in the before period at site i (i = 1, 2,..., NV). 
a, Number of accidents in the after period at site i (i = 1, 2,..., N). 
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C;. Ratio of accidents after to before in the control area for site 7 (assumed free from error). 

n, = a;+;. 

k; = a,/(b,C;). This measures the apparent effect of the change at site 7. It is the ratio of 
accidents after to the number that would have been expected if the change had no effect. 

It is assumed throughout that 6; and a; are drawn from a binomial distribution: 


( 1 4h C; )" 
1+«,C, 14+«,C,) ’ 
in which n, is regarded as fixed. x; is the ‘true’ value of k;, i.e. the value that k; would take 
if 6; and a; took their expected values. From some points of view it would have been pre- 
ferable to assume that b; and a; followed independent Poisson distributions with means /; 
and x;C;#;. The extra parameters /; would, however, have complicated the analysis, and the 
results would have been similar, or in some cases identical. 

Throughout this paper, the summation sign = denotes summation over sites, from 7 = 1 to 
i = N. These limits are omitted to save space. In part of the Appendix, the suffixes 7 are also 
omitted. All logarithms used in the paper are to base e. 





3. ANALYSIS FOR A SINGLE SITE 


When data from only one site are available, there is little choice of method. The obvious 
procedure is to use k = a/bC as an estimate of x. A value of k greater than unity denotes an 
increase compared with the control area, while a value less than unity denotes a decrease. 

To test the significance of the change, one can calculate x? with one degree of freedom in 


the usual way, as follows: 
(0 z ab (« " ab 
. 1+C 1+C 
* he + 





. fd 
1+C 1+C 
a—bC) 
-' “ne r (1) 


It should be noted that this tests whether there was an effect due to the particular change at 
the particular site, not whether the type of change of which this was a representative is 
effective. 


4. ANALYSIS WHEN REAL EFFECTS AT ALL SITES MAY BE ASSUMED EQUAL 


We consider here the method of estimation and the test of significance to be used when prior 
considerations or internal evidence in the data suggest that there is no variation between the 
real effects x; at the various sites. The determination from the data of whether the x; vary 
is discussed in section (5). 

We use the method of maximum likelihood. No special attempt will be made to justify its 
use in the present context, except to say that it appears to give a sensible answer and that no 
more appropriate method is known to the author. 

The first problem to be dealt with is that of estimating x, the common value of k,, Kp, ..., Ky: 

The probability of the sample values a, ..., ay, by, ..., by is 





Pa il (;') (KC;)% 


i=1 


(1+ «0;)"" 
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r). The log-likelihood is therefore 
of L=% fog (7") +a, log x +a; log C;—n,; log (1 +K0)| ‘ 
t. Thus OL 5 [4% _ n;C; 
ow OT [KO + KC," 
Equating to zero, k, the estimate of x, is given by 

a;—kb;C, 

ps ts at DO 
ke 1+kC; a 
re- To solve this equation it is most conveniently expressed as 
B; N% 
she x 1+kC, = 26,. (3) 


The left-hand side of this equation may then be calculated for suitable trial values of k until 

















fe a sufficiently accurate approximation to the solution is obtained. To find the sampling 
me variance of k, we have 
i eR 
OK Kz (1+KC,)? 
P n;KC; : 

i Putting E(a,) = 1+KC,’ we obtain 
an (52) « fete 
se. ant) ~ eC” 
in 1 K 

Hence vark = 2 (ea = : ta (4) 

OK? (1+KC,)? 

The discussion in the Appendix shows that the sampling distribution of log k is rather less 

skew, and more nearly normal, than that of k itself. Thus confidence limits and significance 

tests would be better based on the assumption of a normal distribution of log k. (By a general 
(1) property of maximum likelihood estimates, log k is the M.L. estimate of log x.) 

The asymptotic sampling variance of log k is 
-1 ] 

> at = os 5 
i. var log k ar ) ee (5) 
' (; log x? (1+«C,)? 


This may be estimated by putting the sample value k in place of x, or, when testing 
whether k differs significantly from unity, by putting x = 1. 
‘ior For practical purposes, the formula for var log k may be further simplified. The function 
the x/(1+2)? varies with x as follows: 
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Thus if each xC;/(1+«C;)? in expression (5) is replaced by 0-25, var log k becomes 


var log k = = (6) 
Usually, most values of C; are fairly close to unity, say between 0-5 and 2-0. Even if all the 
C; were 0-5 or 2-0, then the standard error would only be under-estimated by a factor 
(0-222/0-250)? = 0-94. In practice the under-estimation would normally be much less than 
this. 

These formulae for var log k are, according to the theory of the method of maximum 
likelihood, only approximations valid for large sample size V. For moderate sample sizes, the 
Appendix suggests that better estimates are obtained by multiplying them by 


1+2/2n,. (7) 


Thus, for example, if £n; = 40, the variance from the simpler formula is 5% too low. For 
larger values of &n,, it is probably not worth the trouble of applying this correction. 


5. TEST FOR EQUALITY OF THE kK; 


If all the «x; were in fact equal to a common x, then the variation between the sample 
estimates k; = a,/(b;C;) would arise solely from the binomial distributions to which 6, and a; 
are subject. This suggests that it may be possible to use a x? test for variations between the x;. 


Consider the statistic: 

(a; — kb,C;)? 

V=it— kn,’ (8) 
where k is the maximum likelihood .stimate given by equation (2). The corresponding 
expression with x instead of k is distributed as y? with N degrees of freedom; replacing « by 
the efficient estimate k reduces the degrees of freedom to N — 1. 

If x? calculated from equation (8) is significant at an appropriate level, then it may be 
concluded that there are differences between the x;. 

This x? can be used to provide an approximate measure of the variability of the x;. This is 
required later, in § 6. Consider the function y?(w) obtained by replacing k by win equation (8). 
Then in repeated sampling of the 6; and a; from the binomial distributions corresponding to 
their own x,, the expected value of y(u) is 


mGiee—w ye (1G 
u(1+K,C;)? wu \1+K;,C,) ° 








E(x*(u)) = & 


Putting x;—u = x; and expanding in powers of x;, 


x(1—uC;) | xi C(n,—2+U0;) 


oo ow 
BUN) = N42 ral) u(1+u0,)? 





Suppose now that the x; are drawn at random from a population with mean « and variance 
var x;, and that x; and n, are statistically independent. Suppose also that the sample size V 
is large enough for the difference between the estimate k and this x to be negligible. Then the 
expected value of x”, taken over the distribution of the x,, is 


C(n,— 2 +KC,) 
K(1+«C;)? ° 


E(x?) = E(y?(«)) = N+varx,& 


(9) 
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Since E(x?) = N —1 when var x; = 0, and equation (9) is at best only correct to 0(1)), it is 
probably better to replace N by N — 1 on the right-hand side. An expression for var x; is then 
given by v—-N+1 
C;(n,;—2+KC;)° 
K(1+KC,)? 
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(10) 








6. ANALYSIS WHEN THE REAL EFFECTS AT ALL SITES CANNOT BE ASSUMED EQUAL 


If it is required to estimate the average effect for the particular changes studied, rather than 
the average effect in the population of changes of which they form a sample, then the pro- 
cedure of §4 is probably satisfactory. This situation, however, is not often likely to be of 
practical importance and will not be considered further. 

Suppose now that in the population of changes of which those studied form a sample, x; 
follows some distribution, with mean «x and variance var x;. The most satisfactory procedure 
would be to assume a convenient functional form for the distribution and estimate its 
parameters by maximum likelihood. However, all plausible functions that were tried led to 
intractable mathematics, and so a simpler approximate method has been used to obtain an 
estimate k of the average effect x and its sampling variance. 

We shall continue to use the equation 


> t—__ 14 = 0 (2) 


as in §4 when the x; did not vary. It is intuitively clear that this provides a reasonably 
satisfactory estimate at least when the variations between the x; are fairly small. Then, 
provided the error of k as an estimate of x is sufficiently small, we may write 


varlogk = ~ ee ‘ (11) 


(3 S(a) ): 
dlog x} .~. 
a; — xb;C; 
1+20, ° 
This assumes that, over a sufficient range of x, S(x) is linearly related to x. Var S(x) must be 


interpreted to include not only chance variations in S(«) arising from the binomial distribu- 
tions of the b; and a,, but also variations due to the distribution of the x;. 








where S(x) = & 














First, we have <O.n; 
(Fas) = KO;n; (12) 
dlog x} »-. (1+xC,)? 
We now derive an expression for var S(x). Since 
% » 
S(k) == Ta xb;, 
the first term of which is fixed, we have 
var S(xk) = var (Xb;) = X var (b,). (13) 
Now var b; = E(bj) — {E(b,)}*, (14) 
Ni Pe 1 
and E(b;) = E; (<5) =n; Hz =a) P (15) 


2) = B, (MK mal _ (ea) (n,— (azean) 16 
E (b3) By ( at eo mB \T a) tlm NE lara)? 9) 
where the symbol H; denotes the expectation over the distribution of x;. 
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To evaluate these expectations E;, put x; = k +x,;, where « is the mean of the distribution Th 
of x; and the z; are assumed to be small. Then ret 
ae ee 
14,0, 14+«C,\0 14+«0, ° (1+K«C,)? : ny 
co. We eee ( ay a pe 3 k; 
(1+«,C,)?  (L+«C;)? “1+K0,; °° (1+KC;)? sa: 
1 C? 
Thus Bisco) ral! + aaegeret---), fo 
1 1 C2 in 
Pe EES, | 
Rlasee >a) araapl tape eet - . of 
Substitution in the expressions (15) and (16) for #(b;) and #(b?) and then in expression (14) a 
for var (b;) gives, on simplification, ‘js 
_— 1 kC; 0, Ci(n,+KC;— 2) varK; 
var b; = (i+«C,? (1 +xG) é (17) 
Thus, finally, from equations (11), (12), (13) and (17), 
s a. obi 5 ul C3(n; + KC; — 2) 
(1+«C;,)? (1+«C,)4 01 
var log k = : 
(= KCN, ) 
Substituting for var x; its estimate (10) gives se 
1+¢ 
os 18 te 
var log k KOn, (18) 
(1+«C;)? 
where C(n,+«KC;—2) KC;n,; 
Pm: —~# +08 K(1+KC;,)? “(1+ KC)? (19) I 
sy Cm, + KC; — 2) KO;n 
~K(L+ KC? (+K0)F 
This formula for var log k is the same as was obtained when « was assumed not to vary from 
site to site, except for the factor 1 + ¢. 
The factor 1+¢ may for practical purposes be considerably simplified. In the term 
n,+KC;—2, kC;—2 is small compared with n;. If it is omitted, 6 becomes 
g = WIN +1) Eat 
i (2a)? 
KO;n; \ 
where xX, = (1+ KC)" 
Now it has already been noted that the ratio xC;/(1+«C;,)* varies only slightly over the range 
of xC; of greatest importance, and so ¢ simplifies further to 
guess , 
a (2n;)? 


Now this expression for ¢ is only correct to 0(1), and so it is permissible to multiply by an 
extra factor N/(N — 1); this gives: 
2 2 
6= (x4 ~tere. (20) 


N-1 / (=n,)* 
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This seems to be a better expression when all the n,; are the same; for in this case 1+¢ 
reduces to 
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1+g=—% (21) 
N-l’ 

instead of to (x? + 1)/N. The former expression is the ratio of a measure of variability of the 

k; to the corresponding quantity when the «x; do not vary, and it is quite plausible that the 

sampling variance of k should increase in the same proportion. 

It is of interest to note that a factor similar to expression (21) is proposed by Finney (1952) 
for use in probit analysis. Finney calls it a ‘heterogeneity factor’ and advocates its use to 
increase the estimated sampling variance of the median effective dose when the departure 
of sample frequencies from their expected values, as measured by x?, is greatly in excess of 
expectation. 

Adopting expression (20), we have thus obtained for the final estimate of the sampling 


variance of log k: 
x _ ,\ Nini 
wh (y 1 ’ (2n,;)? 














var log k = Osh (22) 
(1+«C;)? 
or approximately 4 ae N? = 
En, f is (we 1 }eng?} ahd 


It is suggested that the factor 1 + ¢ should be used only when 4? is significant or nearly so, 
say when P < 0-20. For larger probabilities, the random error in ¢ may outweigh the advan- 
tage of having an unbiased estimate of variance. 

Expressions (22) and (23) may, if necessary, be multiplied by the correction factor (7). 


7. SUMMARY OF FORMULAE 
In all cases k, the estimate of x, or of the average of the x;, is the solution of the equation 
r.. Fae 
1+kC,; 
The sampling variance of log k is approximately 


a+0 (1455) eo 


xb,. (3) 





var log k = “KC, 
(1+«C;)? 
x? N =n? 

Zi X 20 

where d (5 i 1) (Sn)? (20) 

(a; —kb;C;)? 
yee oe, —« 

t-* ie 

x may be replaced by the sample estimate k when the standard error is required, but by unity 

when it is required to test whether x = 1. 

In appropriate cases, the formula may be simplified as follows: 

(i) Omit 1+¢ when there is no firm evidence that x; varies from site to site (P > 0-20). 

(ii) Omit 1+ 2/Xn, if Xn; is reasonably large, say greater than 40. 

(iii) Replace the denominator by } =n, if most of the values of xC; are in the range } to 2. 





(8) 
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8. NUMERICAL EXAMPLE 


The data concern the effects of installing approximately circular roundabouts at cross-roads. 
There are seven sites, and the data are as follows: 











Site no. b; a; C; 

1 1 6 1-04 

2 6 3 1-25 

3 9 5 1-11 

4 16 5 2-36 

5 10 0 1-13 

6 2 2 1-69 

7 5 0 1-61 
Total 49 21 ae 




















Thus for site 1, there was 1 accident in the before period, 6 in the after period, while in the 
whole of the area concerned, there were 4° more casualties in the after period than in the 
before period. 

The estimate k of the average effect of the changes is given by equation (3). To solve this, 
a trial value k = 0-3 was inserted in the left-hand side; this gave 48-16, which is slightly too 
low. A smaller value of k£, 0-2, was then tried; this gave 53-67, which is too high. & therefore 
lies between 0-2 and 0:3 and by linear interpolation it is found to be k = 0-28, which gives 
logk = —1-27. 

x? is then calculated from equation (8) and found to be 25-8, with 6 degrees of freedom. 
This is highly significant, which means first that the effect of providing a roundabout is not 
the same at all places, and secondly that it is necessary to introduce the factor 1 + ¢ in the 
estimate of the variance of log k if one wishes to draw conclusions about the effect of 
this type of change in general, rather than in the seven specific applications of it being 
studied. 

To test the significance of the departure of k from unity, 1 + ¢ is found from expression (20) 
to be 5-281. The sampling variance of log k is given by expression (24); simplification (ii) may 
be used, but not (i) or (iii). The denominator is calculated with x = 1 for the purpose of the 
test of significance. We find 


5-281 _ 6.3197 


var log k = oh 





so that s.£. (log k) = 0-565, andt = — 1-27/0-565 = — 2-25. This value of t is significant at the 
5% level and it may therefore be concluded that the provision of roundabouts of the sort 
studied on the whole tends to reduce accident frequencies. 

Since it has been established that « is significantly less than unity, the best estimate of the 
standard error of log k is not 0-565 obtained above, but 0-610, obtained by putting « = 0-28 
in the denominator of expression (24). 

It is of interest to note how the standard error of log k would have differed from 0-610 if the 
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factor 1+ ¢ had been omitted and if the denominator of expression (24) had been simplified 
according to (iii). One finds: 
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Standard errors of log k 























Omitting 1+¢ Using 1+¢ 
Unsimplified | 0-27 0-61 
Simplified | 0-26 0-55 





Even in this rather extreme case, the simplified denominator gives much the same answer 
as the more accurate one; the factor 1+¢, however, is very important. 
Using the factor (7), the above standard errors would be increased by at most 0-01. 


The author is grateful to Dr F. Garwood for a number of suggestions. This paper was 
prepared at the Road Research Laboratory and is published by permission of the Director of 
Road Research. 
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APPENDIX 


Distributions of k and log k when the xk; are equal 


We study here the moments of the distribution of log k, where k is the solution of equation (2), when all 
the x; are equal. Put y = logk—logk, i.e. k = ke’. We shall obtain expressions for the first four 
moments of y, and then for its second, third and fourth cumulants, which are the same as those of log k. 
We have 
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= Zb-=z 
shania 1+x«Cev 
Fe a) KC(e¥— 1) ey ] 
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Expanding in powers of y, 
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We shall assume that errors in the estimation of x are small, so that y is small. y is then of the same order 
of smallness as e. We may therefore write 


y = Ae+ Be? + Ce? + De* + 0(€°), 

where, by straightforward algebra, 

A= l/a, 

B = [B—}a]/a’, 

C = (68? — 3af + a? — 38ay]/3a5, 

D = [(—}a+a2P+a%y+a2d — 5aPy — $a? + 5p?) /a’. 
Further straightforward algebra using the formulae for the moments of the binomial distribution leads 
= E(c)=0, Ele)=a, E(e)=a—28, Elect) =a—6f+6y+3a2. 


It may now be noted that the expansion of y in powers of € up to é! is correct to order N-? where N is 
the sample size. (n;, x and C; are all assumed to be 0(1) for this purpose.) This follows from the facts that 
A, B,C, ... are of order N-1, N-*, N-3, ... and that the expectations of e?, e%, ef, e5, ... are of order 
N', N}, N?, N?, .... In what follows, all terms of higher order than N-? will be omitted. 

We now write down the first four moments of y. These are 


Haily) = AE(e) + BE(¢*) + CE(e*) + DE(e*) 





2f8- 28~ 3 
= .*. P—* (6p2— sap +a*— Say) +>( —}ai+a°8+a%y +020 — 5aPy — Saf? + 56%), 
bily) = A®*E(e*) + 2AB E(e*) + (B? + 2AC) E(e4) 
1 2B-—a)? 3 
= + OPO)" 5S (56% Sap + Hat — 227), 
U3(y) = A® E(c*) + 3A? BE(e*) = a. ml 


3 
wily) = AS Ele) = =. 


Using the relationships between moments and cumulants, we find, still to order N-*, 





-. 5(28 — a)? 
_ 41 
K,= at ga OP? — 808 + 12a" — 2ay) — — 
2(28—a) 
kK, = af), K, = 0. 


The variance xk, of y, and therefore of log k, is thus 1/a to order N-', in agreement with equation (5), 
obtained by a different method. For moderate values of N, however, the second term may be important. 
If C; = 1, then 


_ (+x)? 14 82K + SKF -) 
2k = En,) 


Kea= 
wer >? 


For values of -’ near to unity, the factor }(3 — 2x + 3x*)/x is near 2 (it reaches 2-75 at K = 0-5 and k = 2:0). 
In most cases 1t is therefore sufficiently accurate to use 1+ 2/Zn, as a correcting factor. It should also 
be noted that the expression for the first moment of y shows that there is a bias of order 1/N in log k as an 
estimate of log x. This, however, is likely to be negligible in most circumstances. 

The coefficients of skewness and kurtosis are 


POPE ® O(N), yy = KulKd = ON-*), 


Vi = Kq/Kq! = 
This means that as the sample size becomes large, these coefficients of the sampling distribution of log 
k approach those for a normal distribution, for which y, = 7, = 0. 
The coefficients y, and y, for the distribution of k, rather than log k, may be obtained similarly, or via 
the moments of log k. We find 


4B+a 
V1 = ai 





+0(N-4), y, = O(N). 
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order Thus the coefficients of k also tend to those for a normal distribution, but not so rapidly, since 48 +a will 
usually be numerically greater than 2(28—«). For instance, if each C; = C, then 
yillogk) — 2xC—2 
yk) ~~ 5xkC+1° 
This lies betweer. + 1 and — 1 for all xC greater than 4 and lies between + 0-4 and — 0-4 for all xC greater 
than 0-4. It vanishes for kC = 1, which is in the middle of the likely range of eC. We have therefore 
demonstrated that, provided certain reasonable assumptions are valid, log k is more nearly normally 
distributed than k, and'so is a more suitable test statistic. 
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These results are illustrated by two numerical examples, shown in Figs. 1 and 2. They deal with the 
distribution of k when each C; is unity and « is also unity. In this case, 


where Xb, follows a binomial distribution 


(4+ 4). 


Figs. 1 and 2 show the resulting distributions of k for Ln, = 10 and Xn; = 30, plotted on log—prob- 
ability scales. It is clear that the distribution of log k is very nearly normal, and that the formula given 
in (c) for its variance, equivalent to expression (24), is a good approximation. Omitting the factor 
1+ 2/2n, from the formula for the variance gives a noticeably poorer fit in both diagrams. 
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THE MULTIPLE-RECAPTURE CENSUS 
I. ESTIMATION OF A CLOSED POPULATION 


By J. N. DARROCH 
Department of Mathematics, University of Cape Town 


i. IyTRODUCTION 


1-1. A primary classification of the many problems which can now be included under the 
heading of capture-recapture analysis is the one separating the census which uses multiple 
recaptures from the census which does not. 

The best example of the latter type is the ‘fisheries census’, where the main catching is 
done commercially and the experimenter’s job is to keep the population supplied with 
tagged individuals. In this case, recaptured fish are obviously not available to him for 
retagging and all he can hope for is that the captured tags are returned to him. Most of the 
paper by Chapman (1954) is devoted to this type of situation and contains some very simple 
estimates derived by the use of large-sample large-population Poisson approximations. 
Gulland (1955) shows how, if the catching is considered as a continuous-time process 
with constant effort, the natural and fishing mortality rates can be estimated from the 
behaviour of tagged fish alone, that is without any knowledge of total catches. It follows 
of course that with this knowledge estimates of population size are also available. 

In the multiple-recapture census it is usually the experimenter who does both tagging 
and sampling and it is assumed that he employs a method of capture which does not kill 
the animal or affect its future behaviour. The experiment then comprises a sequence of 
samples S,, ..., S,, say, where the members of S,, ..., S,_, are all tagged before being returned 
to the population, while the members of S,, ...,S, are classified according to when, if at all, 
they have been captured before. The majority of papers have discussed this census and 
among them may be mentioned Bailey (1951); Chapman (1951, 1952); Craig (1953); 
Goodman (1953); Hammersley (1953); Leslie & Chitty (1951); Leslie (1952) and Moran 
(1952). In all of these papers except Goodman’s s is a constant. Goodman sets up a model 
in which the number of samples sequentially depends on the total number of recaptured 
tags, which is stipulated beforehand. As far as the individual sample sizes are concerned, 

we notice that everywhere except in Hammersley’s paper, each sample S; is completed 
when one of its statistics attains a prescribed value. This statistic is usually simply the 
sample size, but in what has come to be known as the inverse sample census it is the number 
of tagged or the number of untagged individuals recovered. In this connexion see Bailey 
(1951) and Chapman (1952). It goes without saying that the theory of all these papers can 
be applied to the estimation of the number of classes in a population if the classes are of 
equal size and sampling is with replacement. The number of classes represented in a sample 
constitutes its size and a class is ‘recaptured’ when it is represented in a subsequent 
sample. 

The latest extensions to the general problem have been made by Chapman who exploits 
the natural stratification of animal populations, with respect to type (sex, species) of 
individual (1955) and with respect to place (1956, with Junge). 
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1-2. In the present paper we treat the multiple-recapture census for which the number of 
samples s is fixed (except in §§ 4-4, 4-5 and 5-6). In §§ 3 and 4 the sample sizes are regarded as 
constants and in § 5 as binomial variables. 

Most of the above-mentioned work on the multiple-recapture census has been applied 
to closed populations in which there is neither departure due to death or emigration, nor 
augmentation due to birth or immigration. The restriction to a closed population also 
prevails here, but in a second paper we shall take account of both departure and augmen- 
tation. 


1-3. To extract all the information from a multiple-recapture experiment, tagging must 
be differentiated in order that the full ‘history’ of any individual can be inferred each time 
it is captured. This can be effected in two ways. Either an individual is given a numbered 
tag at its first appearance, or, each time it is captured it is given a new mark distinctive of 
that capture. For some purposes, however, similar tagging is sufficient, where all that is 
required is that each individual bears a mark after being captured. It need not be remarked 
when recaptured. 

2. THE ALTERNATIVE MODELS 


2-1. Let x be the total number of individuals in the population. 

Let s be the total, fixed, number of samples taken. 

Let u; be the number of individuals caught in the ith sample but not otherwise, w,; the 
number caught in the ith and jth samples but not otherwise and similarly w;,;,, etc. 

Denoting a subset of the integers 1, 2, ...,s by w, let 

r= UM, = Mt zat oe FU 2 Wg 
the total number of different individuals caught in the complete experiment. 

Let a; be the size of the ith sample. Then a; = > w,, where summation is over all subsets w 
which include the integer i. ens 

We derive two probability distributions for {w,}. 

Let the probability that any individual is caught in the ith sample be p; = 1—4q;. Thus we 
assume that all individuals are equally likely to be members of any given sample. Further, 
we shall assume that, for any individual, the events: caught in the ith sample, 7 = 1, 2, ...,8, 
are independent. 

The probability of any individual escaping capture throughout the experiment is 


Ig; =@, say. 
7 
The probability of being caught in the 7, ...,/ samples and no others is 
PPI 
—...—Q=P. , say. 
qi % Q 4, ovogl y 
Clearly, the probability density of {w,} is multinomial, viz. 


n 
PL{u.}) a (n—r)! —— Il Pe, (A) 


! 
Il u,! 
a 
where 0 <w,,<n subject to 0<r = Su, <n. We notice that 
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Therefore, (A) may also be written 
n! 25 qh — , 
Pl{u}] = (n—r)! Tu, il PiU *- (A’) 
We now find p[{u,} | {a;}], the conditional density of {w,,} given {a;}. It is obvious from first 


considerations, and is easily deduced from (A), that the a; are independent binomial vari- 
ables B[n, p;]. Therefore 


\ 
pl{a;3] = TI (i )pea-* 
! -1 
and pl{u.} | {a}] = acne I (7) , (B) 
where max {a}<r< ya; (strictly, min(n, ¥a;)), 


O<u;<a 


i<a,, O<u,;<min(a;,a;),  ete., 


with the linear constraints on the w,, 


>%=a (¢=1,2,...,8). 
wordt 


B) is a generalized hypergeometric density. 


(B) 

2-2. Which of models (A) and (B) is appropriate to any given experiment? 

In (A) the sample sizes a; are random variables while the p; are parameters. This model 
is therefore applicable when the effort put into the catching of every sample is fixed before 
the experiment begins since the p; are then fixed, though unknown. (B), on the other hand, 
involves the a; as parameters and should be used only when the experimenter is determined 
to catch no more and no less than a; individuals at the ith sample; and he will only be able 
to do this when animals are fairly easily caught. In fact, if we had to generalize, we could 
say that (B) is likely to be appropriate when the main limiting factor on sample size is the 
trouble involved in marking animals and (A) when it is the difficulty in catching them. 

Most previous work has been based on (B) and (A) is new. (Hammersley (1953) con- 
structed a model in which the a; are binomial variables but this model involves a flaw which 
invalidates the estimation based on it, as we shall show in paper IT.) It has been customary 
to derive (B) as a chain of s— 1 hypergeometric probabilities P[S; | S,, S2, ...,8;_,], and this 
has led to its simplicity being obscured either by the notation employed, by considering 
only the terms involving » or by making sampling-with-replacement approximations. 

As well as being the exact probability description of the capture-recapture experiment 
when the a; are constants, (B) may also be regarded as a very useful device for eliminating 
the nuisance parameters p; when the a; are variables; it leaves only n to be estimated and 
provides a sufficient statistic for n, namely r. One feels intuitively that to estimate n as if 
the a; are constants, when in fact they are not, is not a serious misrepresentation, and this 
feeling is strengthened by the discovery that the two models lead to the same estimate % 
of n, and to the same asymptotic estimate of var(#). Apart from demonstrating this, it 
may be wondered why there is any need to consider (A) at all. The main reason is that (A) 
is capable of generalizations which (B) is unable to accommodate and it is necessary to 
discuss (A) for the closed population before going on to these generalizations, some of which 
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are the subject of paper II. Also, the ease with which a multinomial probability is mani- 
pulated gives it considerable advantage over a hypergeometric probability, even for a 
closed population. 


3. ESTIMATION USING MODEL B 


3-1. The fact that r is sufficient for n has the important implication that n can be estimated 
from similar tagging. 
Regarding (B) as the likelihood L(n) of 2, and omitting constant terms, 


log L(n) = > log (n —a,)! —(s— 1) log n! —log (n—r)!. 


An equation for the maximum likelihood estimate % of n can be found by equating 
Alog L(n) to zero. This involves an error of less than unity in the solution and is equivalent 
to the ‘ratio method’ of maximizing L, which equates L(n) to L(n— 1). Since Alogn! = logn, 
wt must be one of the roots of 

Il (n—a,) = n*\(n—r). (1) 


(1) has a single finite root greater than r which maximizes the likelihood, except when r 
takes one of its extreme values. (i) If r = Ya,, no individual is observed more than once 


t 
and % is infinite. (ii) If r = max {a;} = a,, say, no individual is observed which does not 
appear in the mth sample and % = r = a,,. It is of course in the nature of the capture- 
recapture experiment that (i) and (ii) are extremely unlikely to occur. 
(1) may also be obtained by equating r to its expected value p, say. For 
IT (n—a;) 


n—p = E{n—-r] = + (2) 


ns-1 
This follows from the identity in n and the a; 


n\-1 n! 
saiiacers Se B. 
Il (*) r, {Ug} (n—r)! II u,! 
w 








I] (n—a,) “ 
. 7 n—1\-} (n—1)! 

since E{n-r]= i = HI ( 1. wy @=1—n! Tu! 

II (n—a,) ? 

a 
ns-1 
II (n—a,) (n—a;—1) 

Similarly, E[{(n—r)(n—r—1)] = (3) 


nn —1yp-t’ 
with corresponding expressions for the higher factorial moments of n—r. 


3-2. To apply maximum likelihood large-sample theory in finding the variance of an 
estimate, it is necessary that the following three conditions are fulfilled. (a) The sample 
size must be a constant. (b) The likelihood must consist merely of the product of the 
ind. vidual likelihoods for the separate sample members. (c) The range of summation of the 
random variables must be independent of the unknown parameters. Except for one model 
discussed in § 5-7, which is artificially constructed for the purpose, no other model of this 
paper satisfies these three conditions. In the present context r is the sample size, as distinct 
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from the a; which are the separate catch sizes, and (B) obviously does not satisfy (a) or (5). 
(It does satisfy condition (c) provided n > ¥a;.) Model (A) breaks all three conditions. 
i 
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We can, however, use the ‘d-technique’ to find the asymptotic variance and bias of %. 
(1) may be regarded as defining 7 as a function fi(r) of r. By (2), #(p) = and we may 
therefore expand % about n as a Taylor series in powers of r—p. If we consider % and r as 
continuous variables we can say that dii/dr is finite and differentiable in the range 


On <r<da;—1. 
i 


Confining attention to this range, that is ignoring the possibility of (i) occurring, we have 


~ 


i—n = (r—p) lar, +4r-p|TI (4) 





where r’ lies between r and p. Differentiating (1) 


Bl 1 | Fig cea Die | =0(;)om = ou, 


nm—p|n—p n “Zn—-a,; 





where f(n) = O(¢(n)) means | f(n) | < K¢(n) as n> 00 and each a;->0o in such a way that the 
a,/n, and hence also p/n, are constant. Also 


an 1 ar 1 
7 o(;) and ae 0(5)- 


Further, var (r) = var (n—r) = E[(n—r) (n—r—1)]+(n—p)—(a—p)*. 
Therefore, using (3) 








wia~py wee . = 
var (7)~(n—p))|—— +*=* | = om), (5) 
where f(n)~¢(n) denotes that f(n) = 6(n)[1+O(n-")]. Making further use of factorial 
moments we find that 


F{(r—p)*] = O(n) and E{(r—p)*] = O(n*). 
Squaring (4) and taking expected values 
dn)? 
a ee an 
E[(#—n)?] ~ var (r) lar. 
(The error in replacing E[(r—p)] over the restricted range by var (r) is O(n*c"), 0<c<1, 
which is o(n-1).) Thus, for the limit process stated 
1 —l1 1 |} 
Bia—nyt}~[ 4 -y | 


n-p MN 7 n—-4a; 





= O(n). (6) 


Let £ = E[m]—7n, the positive bias of 7. Then extending the Taylor series by one term 
and taking expected values, we find that 
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Since £ = O(1) and E[(%—n)*] = O(n), E[(% —n)*] ~ var (7%). Thus, it makes no difference 
whether we speak of mean square error or of variance and (6) is equivalent to 





var (a)~|* 45" —» : « (8) 


n—-p Mn i n—a; 
3-3. When s = 2, r = a,+a,—%,. and (B), written in standard hypergeometric form, is 


U2) \A2—Uy42 


(1) is a linear equation with solution % = a,a,/u,., the familiar Peterson estimate. 
Chapman (1951) showed that 
iP as (4, +1) (a,.+1) _ 


1 
Uyet1 





is preferable to 7, since it is always finite and is almost unbiased. This could very nearly be 
inferred from (7). For that formula gives (n —a,) (n —a@,)/(a,a,) as the approximate bias of 
nh, which is estimated by (a; — 42) (dg — Uy2)/u2y, and 

1 _ (@y—Uy2) (42 — U2) 


aA—n' = 2 ‘ 
Uyo(Uy2 + 1) 





We notice that n’ is the solution of (n—a,)(n—a,) = (n+1)(n—r), but unfortunately it 
is not true that [] (n—a,;) = (n+ 1)*-1(n—r) yields an almost unbiased estimate for general 
values of s. ; 

Chapman (1952) showed how n’ can be made the basis of almost unbiased s-sample 
estimation. We shall wait until § 5-3 to comment on his recommendations, as they can be 
more easily discussed for model (A) than for (B). 


3-4. Turning attention now from point estimation to confidence interval estimation 
of n, we assume that r is approximately normally distributed about p. We have already 
noted that r has moments 7. = O(n) and yw, = O(n). Therefore 


Hs 1 
—: * (=) ‘ 
me Wh Wn 


Also, one finds that Y2= a —3 = O(n-). 
‘2 
The expected value of r, regarded as a function of n, is 
I] (n—a;) 
p(n) = n— —- 


and in this notation, the equation for % is p(%) = r or % = p-\(r), say. Let o?(n) denote the 
variance of r, given by (5). Then 


P(r—ko(n) < p(n) <r+ko(n)] = 1—e, 
where k = k(e) may be read from normal tables. The inequalities are approximately 
r—ko(i)< p(n) <r+ko(n) 


ry<P(N)<ry, say, 








nce 


(8) 


be 
3 of 


the 
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or, since p~!(r) can be shown to be a monotone increasing function of r, 
p11) <n <p“\(r9). 


p1(r,) and p~1(r,) (which are precisely the same as 7i(r,) and 7i(r.)) may be regarded as first 
approximations to the solutions for n of r—ko(n) = p(n) and r+ko(n) = p(n), respectively. 
We may now, if we wish, proceed to better approximations nf and nj say, obtaining 


n¥ <n<nz 


as the 100(1 —e) % confidence interval for n. 


4. ONE INDIVIDUAL PER CAPTURE 


4-1. When each sample is of size one, (B) is the obvious probability model to use, though 
(A) can be adapted for the purpose as we shall show in § 5-6. (B) is the basis of the present 
section and is equal to 








(B,) 


where J] w,,! = 1, since every uw, = 0 or 1 and therefore every w,! = 1. 
@ 
Summing (B,) over all values of {w,} such that ¥ u; = fi, 4 ui; = fo, .-., we obtain 
a i<j 
1 n! 8! 
m8 (m—r)!(1!)h (21)... fy! fa! .- 


as the probability of not catching n—r individuals and of catching f, x times where 
«=1,2,...,sand Of, =r, Daf, = s. The step from (B,) to (9) can be made by considering 
Zz xz 





(9) 


the number of ways of distributing s balls in r cells in such a way that none is empty. The 
argument, which need not be included here, follows from putting wu; = 1 if the ith ball is 
alone, u;,; = 1 and uw, = u; = 0 if it is with the jth and no other, etc. 
Summing (9) over all values of {f,} such that } f, = r and > af, = s, we obtain (Jordan, 
1947, p. 206) . . 
1 n! ‘ 
ns (n—r)! 7 - 
as the probability of catching r individuals with s samples, where oj = A’(0*)/r!, a Stirling 
number of the second kind. (10) was found by Craig (1953) when considering the estimation 
of a population of butterflies. 
4:2. For the purpose of estimating n little alteration is required to the general results of 
§§ 3-2, 3-4. 
fi is the solution of (n—1)* = n®“(n—7), (11) 


which may be approximately written 
e-sm = 1—rjn. 
The appropriate limit process is now n > 00, s->0o such that s/n is constant, and we find that 


var (7%) ~ n[e!” —1—s/n]“ = O(n), 
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Confidence interval estimation may be performed with 
p(n) = n[l—e-8"], a(n) = ne~*9/"[e8" — 1 — s/n]. 
4-3. Suppose now that instead of s being fixed and r variable, sampling is continued until 
a fixed number, r, of individuals have been caught. Then 
P{r—1 individuals in s— 1 samples and a new one at the sth sample] 


1 n! 1% —rt+l 


~ mA (n—rtl) on 


1 n! — 
= ————__ or 9 
n’ (n—r)! “> 5) 





where s = r,r+1.,.... 

This model will be referred to as inverse, since the term sequential has already been given 
by Goodman to his census which we discuss in § 4-4. We remark that the maximum likeli- 
hood estimate of n remains the same as for the direct model (10). 

Let ¢(t) be the probability generating function of s. Now 








ea ae 
27" = GH. ry’ 
(Jordan, p. 175). It follows that 
Be, ne! a? 
o(t) = (n—r)! (n—t) (n—2t)...(n—(r—1)t)’ 
e —e r—1 
Differentiating ¢(t), Elsj=ns 1 (13) 
k=0 2 — 
r—1 k 1 
var (8) = nd whe (14) 
The method of moments estimation equation, obtained by equating observed and 
expected s, is r-1 ] 8 
= =- (15) 


k=0 2 — k n 

As Craig pointed out, (15) is the exact maximum likelihood equation for the likelihood 
1 ! . ; 
— ae and the solutions of (11) and (15) therefore differ by one at most. (15) has no 
solution %>r when s>r(1+4+...+1/r) = 8)(r) say. As far as the method of moments 
interpretation of (15) is concerned, this is explained by the fact that, since n > r, E[s] < 89(7). 
That is, there is no expected value of s to which an observed value greater than 89(7) corre- 
sponds, and it is therefore meaningless to equate them. It is not likely that s will ever be 
greater than s,(r) in practice. s)(100), for instance, is 519. Before making 519 catches to 
obtain 100 individuals, the experimenter would be sure to doubt the randomness of his 
sampling or the correctness of his (necessary) information that n > 100. 

For the likelihood (12), s is sufficient for n and ignoring the possibility that s = r (which 
makes % = 00), using (14) and the same technique as in § 3-2, we find that 


r—1 k —1 
var (a) ~nl y oa = O(n), 





— - Ps ff pe 
P me (n —k)8 [= (n— 5a = O(1), 


where the limit process is now n +00, r-> oo such that r/n is constant. 








——————— 
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It is readily shown that s is normally distributed neglecting terms O(n-+). Therefore, 
confidence interval estimation of n based on (13) and (14) proceeds as in § 3-4 except that 
fi = 7i(s) is now a monotone-decreasing function of s. 

All of the formulae appearing in this subsection may of course be simplified by using 
integral approximations to the sums of reciprocal powers of n —k. 


4-4. The sequential census of Goodman (1953) can be described as follows. Before the 
experiment is begun, an infinite sequence {a;} of sample sizes is postulated together with 1, 
the number of tagged individuals to be recovered. Sampling stops at the completion of the 
sth sample, s being defined by 


s—1 8 
4-17.14 <, 1 4;,—-7, 21, 
i=1 i=1 


where 7, denotes the number of individuals observed in the first s samples. 

When all a; = 1, sampling stops as soon as s—r = 1. We comment briefly on this par- 
ticular case using the approach of the present section. We require 

P{r individuals in r+1—1 samples and a previously caught individual at the (r + 1)th] 


1 n! r 
7 ae oa 
ntl (n—r)! "hn 


r n! 


n+! (n—r)! Orta: (8) 


Maximum likelihood estimation remains the same as before and r or s (= r +l) is sufficient 
for n. There is, however, a minimum-variance unbiased estimate. Since 


n ! 
5 a ee 
= nrtt (n te r)! r+l—1 
is an identity in n for any positive integer 1, 
es r al , 
a 1 Ory = 0. 


ran" (n—r)! 


Therefore o7,,/07,,_, is an unbiased estimate of n. Moreover, it is uniquely unbiased as is 
easily seen by induction on n, and because it is sufficient it has minimum variance (Rao, 
1952). 

Using more general methods, Goodman expressed the same estimate as K(r,1)/K(r,l/—1), 


where K(r,0) =r and K(r,l) = rE K(t,l— 1). By observing that o7,, = A“[(r+1) o7t?] 
t=1 ¢—1 

(Jordan, p. 171), and defining A~1¢(r) = ¥ d(¢) + constant (Jordan, p. 101), it follows that 
t=0 


K(r,l) = rot,,, which accounts for the equivalence of the two expressions for the unbiased 
estimate. Goodman made reference to tables facilitating the calculation of this estimate. 
He also provided another basis for estimation by showing that as n >oo, | remaining con- 
stant, the distribution of s?/n tends to that of 3, 

4-5. Does the ratio of two Stirling numbers afford an unbiased estimate of n for the direct 
or inverse models? The answer is no, except in one unimportant instance: for the direct 
census with s >, o%,,/0% is unbiased and has minimum variance. 
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The application of the three models of this section to the estimation of the number of 
classes may be framed in the language of coupon collecting. (10), (12) and (16) correspond, 
respectively, to the collection ceasing when the number of coupons (s), the number of 
different kinds of coupon (r) and the number of ‘swop’ coupons (s—7r) reach prescribed 
values. 


5. ESTIMATION USING MODEL A 


5-1. It will be seen from (A’) that r and {a;} are jointly sufficient for n and {p,}. Con- 
fidence interval estimation of n is therefore no longer a practical possibility. 
Differentiating (A’) with respect to p;, we find that the maximum-likelihood estimate of 


p; is 








Hence, 7 is again the solution of (1). 

The same estimation equations may also be deduced by the method of moments, for 
E{a,;| = np; and E[n—r] = nQ. 

The derivation of formulae for the variance and bias of % is much the same as for model 
(B). Writing r=a,,, and Q =4q,,, = 1—p,,, for convenience, the solution of (1) is 
nh = fil{a,}], a = 1,2,...,8+1. a[{np,}] = n and 


on On Onn 
i—n = pe bests ar + pablo: ae an ven te ES 
nm—n 2X (4, np.) aa, t 2 E (4%. np.) Jatt 2,(% Np.) (a, "Pe)eq.da, °°” (17) 


where all derivatives are evaluated at {a,} = {np,}. It can be shown that any derivative of 
i of order k is O(n-**) for the limit process n + 00, {p,} constant. Also, that all multinomial 
moments of order 2/ are O(n’) and of order 2/+1 are O(n’). These two facts, combined with 
several pages of tedious algebra, lead to 


var (a)~n[ 5 +8—-1-Z7] (18) 
[e-1-32] + [2-1-2 
and pr == i di i Qi (19) 





*  gte-3-ze] 


5-2. When s= 2, n’ = (a, + 1) (a, +1) -—-l= Ss. +r 
Ut 1 Uyg+1 
is again an almost unbiased estimate of n. For 


n n! " 
—_ i a 
—___ QP Py Pu = 1, 


Uy, Us, U2 =0 (n—r)! u,! Us! Uy2: 
where Q = 4,92, P; = 2192, Py = 41 P2, Pie = Pi Po, and it follows easily that 
E{n') = n—7q,92(1 — py po)". 
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Seeing that (1—~p,p.)""1 is approximated well by e~#"x2!, 
E{n’] = n—E[n—r] e-Fa! (21) 
to a good approximation. The negative bias of n’ will in general be small. 


Consider now the conditional expectation of n’ given a,. This leads to a slightly different 
statement of the last result which we shall require in § 5-3. For this purpose write 


Ug 
Uyt1 
Given a,, U, and wv. are independent binomial variables B[n —a,, p.] and B[a,, p.], respec- 
tively, and 


n’ = a,+(a,4+1) 





1 1 
E[u,|a,] = (n—a;) po, E Fe | a,| aes [1 —ggs**]. 


(a4, +1) p. 
Therefore E{n’ | ay] = a, +(n—a,) [1—gh] 
= N—(N— Ay) qo(1 — py) (22) 
= n—E[n—r| a] e~Pl2! 4) (23) 


to the same approximation as before. The expectation of (22) over a, plainly gives (20) 
again, but the inference we wish to make is that the bias may be neglected after taking only 
the conditional expectation and, what is more, that the difference between (22) and (20) 
is negligible. 
= 
From (18), var (n’)~n laa +1- Le =| (24) 
192 "m Ws 
=e 
PiPs 
E{n—-r] 
=n : (25) 
E{uy2] 

5-3. In order to consider Chapman’s recommendation (1952) for an s-sample unbiased 
estimate of n, lei a_,, denote the number of different individuals captured before the kth 
sample, k = 2,3,...,8. (Thus a2, = 4.) Further, let a—,,;, denote the number of different 
individuals captured before and at the kth sample. Thus, for instance, if s = 4, 





Beg.3 = Uyg t+ Ugg + Uyo3 + Uyg4 + Voge + Uyo30- 
Clearly, Deny = Ven th, — Vex. 
and M—Gepiy Vep—Ven.es UW-Ack.kr Vek.k 
are distributed multinomially with parameters n and 


Qa -++Qes (Lda Ina) Mes (Ia Wea) Pes (L—9 - ++ Yet) Pre 
Therefore, in the same way as for s = 2 
Sat htt) _ 

a; p+1 
is an almost unbiased estimate of n if, as we shall assume throughout this subsection, 
sampling is large enough. We notice that the covariance of any two of these estimates is 
negligible compared with their variances. For, if 1<k 
E{ni(n;,— E{n;,}) | Hei, %, a<x] = n,(E[ ny, | a.;,] — E{n,)), 

which is negligible compared with var (n;,). (Compare n times the difference between (21) 
and (23) with (25).) 


ni, = 1 (k= 2,3,...,8), 
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8 
So far, we can say that > A,,n;,, where ¥ A;, = 1, is an almost unbiased estimate of n and 
k=2 k 


var (SA,.n;,) = SA; var (n;,). Now, from (24) 
k ke 








var (n;,) n| ; +1 : | = Te say 
. 1+ UK MW -+-U-1 Ik mt 


I, may be described as the information on n contained in ;,. Similarly, let J be the informa- 
tion contained in %, the original s-sample biased estimate of n. Then, from (18) 





1 1 le | 
1-5] Pe Es =|. 26 
M11 ++ Vs 4 qi ( ) 


8 
We notice that } J, = 7, which means that the n;, contain between them as much informa- 
k=2 
tion onn as’. However, the only linear function > A;,n; which uses all of the information 
ke 


is the one for which A, = J;/J. The J;, are unknown and to substitute their estimates would 
introduce bias which the use of LA,,n}, aims to avoid. Chapman proposed the arithmetic 
k 


mean 3) n;,/(s—1) = n* say. n*, however, involves a loss of information of amount 
k 
I-(s-1)(TIPP, 
k 


which will not in general be worth forfeiting as the J, will most likely be fairly disparate. 
Accordingly, as a general rule, we may say that 7 —b, where 6 is the estimated value of £, 
is preferable to n*. 


The nj, may be called backward estimates of n in contrast to the forward estimates nj, 


defined as . te 1) (an, ‘ +1) 
Ny oo 


ates —1 (k=1,2,...,8—1), 
where a, denotes the number of different individuals — after the kth sample and 
a, >;, the number which are also caught at the kth sample. S A;,n;, can be discussed in very 
much the same way as > A,.n;,and, theoretically speaking, miei is nothing to choose between 


them. There is an santies practical difference, however: while the n;, can be evaluated 
from similar tagging, the nj, require differentiated tagging. 


5-4. If ‘catchability’ is constant throughout the experiment, p; will differ from p, only 
if the amount of effort expended on the ith sample differs from that expended on the jth. 
More precisely, if we know that e; units of effort are expended on the ith sample, we may write 


qi = ei (¢ = 1, 2, ...,8). 


This follows from writing the probability of any individual escaping capture when subjected 
to de units of catching effort as 1 — ade +0(de). Previous authors have taken p; = ae; which 
is an approximation, albeit a perfectly valid one, to the above relation. 

Incorporating g; = e~*i in (A’), it is found that the maximum likelihood estimates of 
a and n are obtained by solving 





a,e 
n—r=ne*¥i and nDde=>d pt, 
i ,e~-o™ 
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The method of moments versions of these equations are 


r=KH[r] and > Frag E{a,]) = 0. 


a," 


Using the approximation p; = ae;, the latter equation becomes py a; = Ald a;]. 


Is it worth including knowledge of the effort in the pokaliiiity siiiel To answer this 
question as speedily as possible, consider the case when it is known that all e; are equal and 
therefore all p; = p, say. The equation for n is 


(n—a)® = ns (n—1), 


2 4S 
with solution %, say, where @ = = Sa,;. We find that 
i 
l -1 
var (i,)~n|7+5— 1 -*| : 
q q 


which is the same as (18) when qg; = q. Thus, in the limit, no information is gained by using 
the knowledge that an equa! amount of effort has been given to each sample. This may be 
attributed to the fact that the knowledge would in any case have emerged from the sampling, 
since as n->0o, plim[a,/n—a,/n] = 0 and therefore also plim[a;/%—a,/%] = 0. Clearly, 
the conclusion of no gain in information as n> 00 may be extended to the case when the e; 
are known and unequal. 

For finite n, knowledge of the effort will produce only a second-order increase in in- 
formation. This would not be negligible for small populations, but it is when estimating 
small populations that the experimenter would be most wary of assuming that catchability 
is constant. For this reason, it will be better in all cases to make the p; free parameters to 
be estimated solely from the sampling. On the other hand, this inference is valid only when 
the population is closed, that is when there is no natural or artificial alteration of the 
population size. Otherwise, knowledge of the effort is very useful. For its use in the fisheries 
type of census see Chapman (1954) for instance. 


5-5. From (26), the information as a function of effort is 
1 
I = —[e*¥*i+s—1— SS ei]. 
n i 


Suppose that catchability, represented by a, is constant and that there is a fixed total amount 
of effort for the whole experiment. Writing « Se; = c we see that Q = e~“is fixed and con- 
sequently also E[r]. Also, we have that “ 


1 a? fs 
I =i [e-1-0-Fza-Fua-.... 


For given s, > e” is a minimum and therefore J is a maximum when all the e; are equal. 


i=1 
1 fc? 1\ ¢ 1 


When they are equal 

and thus the information increases as s increases. Both of these conclusions are probably 

little more than reflexions of the fact that equal effort per sample and an increased number 

of samples give rise to a larger E[> a,] for given E[r]. 
i 
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More important, from the practical point of view, than knowing how to get maximum 
information from fixed effort is to know how the information increases with increased effort. 
Supposing that all e; = @ say, 


a2 
= — es ry i al. @ 
[= in (8 1) é@[1+ 4a(s+1)é+...]. 
Thus, if the effort per sample is enlarged to ké, the information is enlarged to more than k2/. 
While if @ is held constant and the number of samples increased from s to s + 1, the informa- 
tion is multiplied by more than (s+ 1)/(s—1); in other words, the information is roughly 
proportional to the number of different pairs of samples, as might be expected. 


5-6. Ifs is large and each p, is small, the probability that any sample size is one will be 
a small quantity of first order, while the probability that it is greater than one will be of 
second order. Therefore, in the limit as soo and each p;—0, sampling becomes ‘con- 
tinuous’ and each ‘sample’ is of size zero or one. In this way, we obtain a valid description 
of the experiment for which only one individual is captured at a time. 


8 
To formulate this idea precisely, let Q,, denote the coefficient of z* in [J (q; + p;2); that is, 
i=1 
let Qo = 2, 2, = SP;, Q2 = XP, .... Then, summing (A) over values of {w,} such that 
t i<j 
Yu; =f Xd uz = fe, ---, the density of {f,}, where fy = n—r, is 
i i<j 


ee II QF. (27) 
ae 
z=0 


Consider the limit process: soo, max{p;}—>0 such that [](1—p,;) = Q remains fixed 
i 
and equal to e~ say; that is to say, such that a > e; remains fixed and equal to A. For this 
i 


process, [] (q;+p;z) >e**-» and therefore Q,, > e~ A*/x!. Thus, the limit of (27) is 


a 





! bed X\ fz 
a ii (eS). (28) 
ye=0 ° 
J fe 


Let us now redefine s as the random variable > af,, the total number of catches made. 
z=1 
(28) can then be written 


enna As 


ry)! (INA(Q 4... fil fel...” 


Craig (1953) postulated this model as an alternative to (10) and discussed the estimation of 

A and n. Although the sampling is now a continuous process, it is by no means necessary 

that it extends over only one interval of time. In practice, the experimenter will expend 

effort until he catches an individual and then will pause while marking and recording it 

and letting it return to the population. There is no conflict between this practice and (A),). 
The joint probability generating function of r and s for (A,) is 





n! 


Bye) = B| 11 | = ML +(e 1)) 
a2=1 
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We notice that the marginal distribution of s is Poisson with parameter nA and therefore 
that the conditional probability of { es given s is 
n! s! 


Plt fe} | 8] = n§(n—r)!(1!)/(2!)4... fy lf! .. 


which is (9) again. Thus, there are two routes from (A) to (9): either via {B) and (B,)or via(A,). 
The marginal distribution of r is binomial B[n, 1—e-*]. The conditional probability of 
{f,} given r is therefore 
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r! Xs 


Pt fet | 7) a (A—1P (INA (2%... fil fol .. (29) 





This density was discussed by Craig from the point of view of the truncated Poisson dis- 
tribution defined by the probability of a caught individual being caught x times being 
eA jz 
eee 

but plainly cannot serve as a probability description of any possible way of conducting a 
capture-recapture experiment; for it demands that both the total effort and the total number 
of different individuals be fixed in advance, which is impossible. 

5:7. We notice that (29) satisfies conditions (a), (b) and (c) given in § 3-2 for the applic- 
ability of large-sample maximum likelihood theory. This is also true of the counterpart of 
(29) for the general case. From (A) we have 








U,$\T ee 30 
rlwdin= 750 (9) (30) 
(30), like (29), does not mention the uncaught individuals and does not truly describe any 
experiment. However, densities of this type are important as theoretical devices, as we can 
demonstrate by applying standard maximum likelihood theory to (30). 
Maximizing L = log p[{u,} |r] with respect to {p;}, we obtain the equations 
a; r 
—= s = I, 2,...58): 31 
oe a ) (31) 
Let 0 = (1—Q)-. Then 6 = (1— Q)~ is the maximum likelihood estimate of 0. (31) implies 
that 6 satisfies the equation fl (r0 —a;) = (r0)*-! (r0 —r), that is r6 satisfies (1). Therefore 
r= i. 
We can obtain the asymptotic mean square error of 6 for (30) as r->oo by first finding 
oL 
the information matrix, V—! say, whose (i,j) element is — # E ip . We find that 


thi 








" Q 
Vi= wee 


where D is the diagonal matrix whose (i, 7) element is gz} and W is the matrix whose (i, 7) 
element is w;—1, where w; = q;(1—@)/(p;@), and whose every other element is — 1. 

Now, as r>oo, E{(—0)| r]~d'Vd, 
where d is the vector whose ith element is the derivative 06/0p; evaluated at {f,} = {p,}. 
On differentiating 9 = [1 —[] (1—#,)]}-', it is found that 
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where 1 is the vector all of whose elements are 1. Hence 
E{(6—6)2 | r]}~__*\ Q 1’‘DDW-!D-!D1 = _—@ wy, 
Ue |r1~ op r1—@) 
To find 1’W~1, let W-11 = x. Then Wx = 1 and the ith element of x is seen to be a =e 
Therefore ~ ei 


v 





2 Q ‘x= q e : 
BUG— | rl~ = gt = sori Swe 


Substituting for w; 
1(0- nae PA) ogee lt « 
sid r(1 aio 2, bleed, | 7 





, Say. 
Q i . 
By means of the é-technique we can further say that 
/ l \ 
BOO |r] = +73 +0( 5) 
(32) 
* me. We Fer A 
and E((O 0)|r]="+2+0(5).| 


Using (32) we can re-derive formula (18) for the asymptotic value of E[(#%—n)*]. For, 
since 0 = (1—Q)-1 and % = 76, 


E{(i—n)* | r] = r2B[(O— 0)? | r] + 62(r —n(1 — Q))? + 2rO(r —n(1 —Q)) E[(G—4) |r]. (33 
In evaluating the expectation of the right-hand side of (33) over r, we consider the range 
kn <r<n, where 0<k<1—Q and kn is integral. This approximation permits the use of the 
asymptotic formulae (32) and produces errors which asymptotically are negligible. Two 


observations are sufficient to establish the latter claim. First, if P. denotes the probability 
of r 


kn ,» (n—kn+1)(1-Q) 
zr < Fin 1) (1—Q)—kn 


(Feller, 1957, p. 140), and the last quantity is O(n~4c”), where 


1-Q\*/ Q \i-* 
c= (4 ) (5) <1. 
Secondly, §<¢(s)r provided always that we ignore the possibility that ua; =r which 


makes 6 infinite. Substituting from (32) we find that 
E{(w—n)*] = E[rO]+ 6? var (r) + O(1) 


= n[i+e-1-a2]" +0(1), 


a Gi 


which is (18). 
The rederivation of (18) is not important in itself. However, when combined with the 
fact that 6 is asymptotically efficient for 0, the above argument has an important conse- 


quence. Namely, that for the class of estimates n* = r0* which satisfy the very reasonable 
conditions 


E{( 27 y=" +33 +0(5) 


(34) 
E{(0*—0)|r] = 144 246 03): 
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i has asymptotically minimum mean square error since © is the minimum attainable value 
of A,. (Conditions (34) are sufficient for this conclusion. They are quite possibly not 
necessary.) (34) implies that 


J. N. DarRocu 


E{(n* — n)?] = on +5 +0(1) 


n 


35 
and E(n* —n] = f+" +0(7), (35) 


which are even more reasonable. Unfortunately, (35) does not imply (34), though it is 
difficult to imagine estimates which satisfy (35) but not (34). 

In conclusion: among a wide class of estimates of n, thosz derived from (A) or (A,) by 
the method of maximum likelihood are asymptotically best in that they have minimum 
mean square error. 


I wish to acknowledge my considerable indebtedness to the referee for his invaluable 
comments on two previous drafts of this paper. 
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CONFIDENCE INTERVALS FOR DISTANCE 
IN THE ANALYSIS OF VARIANCE 


By M. G. BULMER 


Department of Social and Preventive Medicine, University of Manchester 


1. INTRODUCTION 


This paper is concerned with the so-called finite model (or Model I) of the analysis of variance. 
It is assumed that we are given n observations, y,, ...,y,, Which are normally and indepen- 
dently distributed with the same variance, a”, about mean values which can be expressed in 
the form 


k 
E(y;) = 22h (1) 


where the 2;;’s are known, constant, coefficients and the £;’s are fixed but unknown para- 
meters. This can be written in matrix notation as 


E(y) = X8, (2) 


where y is the n x 1 column vector of observations, X is the n x k matrix of the x;;’s and 8 is 
the k x 1 column vector of parameters. It is assumed that the model is stated in such a way 
that X is of rank k (this can always be done by eliminating any redundant parameters). The 
statistical problem is to test the null hypothesis, which specifies the values of the first 
r parameters, /, ...,£,, (7 <k). Any hypothesis which specifies the values of r linear, inde- 
pendent, functions of the parameters can be put into this form by a suitable re-statement of 
the model. 

It is convenient to partition X into X, and X,, where X, consists of the first r columns of 
X and X, of the remaining k-r columns. 8 likewise can be partitioned into y and , where y 
contains the first r parameters and ¢ the remaining k—r parameters. Thus y is the vector 
containing the true values of the parameters in which we are interested, while ¢ is to be 
treated as a vector of nuisance parameters. It will also be convenient to write y, for the set 
of parameter values specified by the null hypothesis, ¥ for the least squares estimate of 
y, and y, and y, for arbitrary points in the hypothesis space, I’, containing all possible sets 
of values of the first r parameters. 

If S, is the sum of squares testing the null hypothesis, y,, then 


S, = (¥—Yo)' BY— Yo); (3) 
where B = C,,-—C,,C,C,, (4) 


(where C,; = X;X; for i, 7 = 1, 2); (this follows fairly easily from the results given in Kemp- 
thorne (1952, pp. 59-61)). When the null hypothesis is true, S,/o? is distributed as a chi- 
square variate with r degrees of freedom and S,/a?, where S, is the error sum of squares, is 
independently distributed as a chi-square variate with n —k degrees of freedom. When the 
null hypothesis is false, S,/o* is distributed as a non-central chi-square variate with r degrees 
of freedom and non-centrality parameter A, where 


do? =d = (¥—Yo)' Bly — Yo). (5) 
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This expression has been obtained by substituting y for 7 in the right-hand side of equa- 
tion (3). The distribution of S, is unchanged. 

The purpose of this paper is to construct two types of confidence interval for 6=Ao?. It 
will be shown in § 2 that (6/n)? is a useful measure of the ‘distance’ of the true hypothesis, 
y, from the null hypothesis, 9; it is hoped that a confidence interval on this measure will be 
useful when one wants to know how good an approximation to the truth the null hypothesis 
is. In §3 a ‘simultaneous’ confidence interval for 6 is developed which can be used at the 
same time as Scheffé’s (1953) simultaneous intervals on the individual parameters or linear 
functions of them. In § 4, an approximate confidence interval, in the ordinary sense of the 
word, is constructed for 6. 


2. THE MEASURE OF DISTANCE 


The power of the analysis of variance depends on the ratio A = 6/7, and previous work has 
been confined to this ratio. Confidence limits can be placed on A, in principle exactly by 
using the exact non-central F-distribution (see equation (33)), in practice approximately by 
using Patnaik’s approximation to this distribution (see equations (35) and (36)). Usually, 
however, the error variance is of no theoretical interest, and we should prefer to have infor- 
mation about é rather than about 6/0°. 

The interpretation of (5/n)? as a natural measure of the distance of the true from the null 
hypothesis rests on the following two properties. First, let us for arbitrary y, and y, define 


A(¥1—Y2) = (¥1— 2)’ Bly — Yo); (6) 


so that d(y —y,) = 6. Then d}(y, — y,)/n? is a true metric function on the ['-space; in fact it is 
the ordinary Euclidean distance on a linear transformation of the I’-space since B is positive 
definite. Secondly, for an arbitrary vector of nuisance parameters ¢,, dis the minimum value 
of the sum of squares 


(Xyy + Xoo — Ky Yo — XoGi)’ (Kr y + XoS — Xi Yo — Xb) (7) 


over all ,. This can easily be shown by differentiating (7) with respect to ¢, and equating to 
zero. Thus 5 is the minimum sum of the squared deviations of the true model, X8, from 
the model, X,y,+X,¢, partially specified by the null hypothesis. There are n of these 
deviations and so (6/n)* is a natural measure of the distance of the trve hypothesis from the 
null hypothesis. In the case of an orthogonal design 


8 = (¥—Yo)’ Giu(y¥— Yo). (8) 
In practice, 6 is often most easily calculated from the formula 
6 = E(S8,)—ro*. (9) 


For example, in a randomized block design, in which the ith treatment has been replicated 
n,; times and has an average effect 7; (subject to the restriction X7; = 0), it is easily seen that 
6 = &7?, when the null hypothesis is that all the treatment effects are zero. 

The techniques developed in this paper can be applied to the chi-square test of goodness 
of fit as well as to the analysis of variance. For, as Patnaik (1949) has shown, the chi-square 
criterion is, approximately, distributed as a non-central chi-square variate with non- 


centrality parameter _—m,\2 
awk (2%) (10) 


t 
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where 77, is the theoretical and p,; the true probability of an observation falling in the ith class; 
the validity of the approximate expression for A depends on (7;—;) being of order n-4, 
Thus A/n can be regarded as a weighted sum of squared proportional deviations of p; from 
m;, where the weights add up to 1 and so (A/n)? is a natural measure of the distance between 
the true and the null hypotheses. The problem of placing a confidence interval on this 
quantity is formally the same as that of placing a confidence interval on (6/n)? in the analysis 
of variance when it is known that a? = 1, that is when the error mean square is equal to 1 and 
has infinite degrees of freedom. It is conjectured that a similar procedure is valid in other 
applications of the chi-square technique (for example, when some of the parameters must be 
estimated from the sample). 


3. A SIMULTANEOUS CONFIDENCE INTERVAL 


Let us write S,(y,) for the sum of squares testing a particular hypothesis, y,. If we assert 
that the true value, y, satisfies the inequality 


Si(y) <7, F,, (11) 
where M, = S,/(n—k) and F, is the upper 100«% point of the F-distribution with r and 
n — k degrees of freedom, then the probability of being correct will be 1—«. If, therefore, we 
find the minimum and maximum values of d(y, — y,) defined in equation (6), over all values 


of y, satisfying the inequality Si(y,) <M, F (12) 


and assert that 5 lies between these extremes, we shall be correct with a probability of at 
least 1—«. 

We first observe that the extreme values of d(y,—y,) over all values of y, within the 
region defined by equation (12) are the same as the extreme values of d(y,—y,) over all 
values of y, on the surface of the region, except that the minimum value when y, lies inside 
the region should obviously be zero. This follows from the fact that d}(y, — y,) can be regarded 
as ordinary Euclidean distances on a linear transformation of the ’-space. These extreme 
values are given by the following theorem. 


THEOREM. The extreme values of d(y, — y)) over all values of y, on the surface S,(y,) =¢ 
are [Si(yo) + ct}. 

Proor. S,(y,) is obtained by substituting y, for y, in equation (3). d(y, — yo) is defined by 
substituting the fixed value y, for the variable y, in (6). Differentiating 


A(¥1— Yo) —mS,(¥;), (13) 
with respect to y, (where m is an arbitrary Lagrangian multiplier) and equating to zero, we 
obtain (¥1—Yo)’ B—m(¥—-y,)' B = 0. (14) 
Post-multiplying by B-! and transposing, 

Y1—Yo = ™(Y¥—Yi), (15) 
which on re-arrangement gives 

(1+) (¥1—Yo) = m(¥ —Y:)- (16) 

Hence, from (15) (¥1— Yo)’ B(yi— Yo) = m*(¥— 1)’ BY — yi) = me, (17) 
and from (16) 

(1+)? (¥1 — Yo)’ Bly1— Yo) = ™(¥ — Yo)’ BUY — Yo) = ™*S,(Yo)- (18) 


Eliminate m from (17) and (18) and the theorem is proved. 
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Thus if we write rM, H, = c and assert 

(Stet)? <3 <(S$+ ety, (19) 
where S, = S,(yo), except that the lower limit is to be taken as zero when S, <c, the prob- 
ability of being correct is at least 1— a. This is an extension of Scheffé’s (1953) method for 
placing simultaneous confidence intervals on any number of linear contrasts; for any one of 
Scheffé’s intervals is the shortest interval implied by the inequality (11). Thus we can place 
confidence intervals on any number of linear contrasts by Scheffé’s method and at the same 
time on the ‘distances’ of the true hypothesis from any number of null hypotheses by the 
method of this section, and still be certain that the probability that all these intervals are 
correct is at least 1—«, since they are all implied by (11). If, however, we only want a con- 
fidence interval on the distance of the true hypothesis from one null hypothesis, then we 
should be able to find a considerably shorter interval. This problem is considered in the next 
section. 

Example 1. The data in Table 1 relate to the head breadths of 142 skulls belonging to 
three series and are taken from Rao (1952). An analysis of variance on these data is given in 
Table 2. To calculate the 95 °4 simultaneous confidence interval on the distance of the true 
hypothesis from the null hypothesis that the head breadths of the three series are the same, 
we find c = rM, F, = 2x 31-50 x 3-06 = 192-8, c} = 13-89. Hence the limits for 54 are Stict 
or 1-56 and 29-34 and the required confidence interval for (6/n)? is 0-131 to 2-462 mm. 
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Table 1 
Series | Sample size Mean head 
| breadth (mm.) 
aoeeatid eredtes 
: | 83 | 135-87 
. | 51 | 138-22 
$ | 8 | 137-75 








Table 2. Analysis of variance 























| | 
Item | D.F. | 8.8. M.S. F 
ae | ———- 
Between series | a4 238-59 119-29 3-79 
Residual | 139 | 4378-05 31-50 ones 





So far we have only considered the case of testing a single hypothesis, y = y9. If, however, 
there are several factors in the experiment, we may want to consider the effects of these 
factors (and of their interactions) separately. That is to say, we want to split up y into m 
exclusive components, a, ..., @,,, Where a; contains 7; of the parameters in y and Xr; =r. We 
then want to set limits on the distance measure 6; of a; from a,;) separately for 7 = 1, ...,m. 
Now, if we write S,(a;) for the sum of squares testing the sub-hypothesis a;, then 
S,(a,) <S,(y) and so equation (11) implies 


S,(a,;) <7M,F,. (20) 
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We can thus assert simultaneously for all 7 
[SH(ajo) — cb P <4, <[S4(ai0) + oP, (21) 


except that, as before, the lower limit is to be taken as zero when S,(a;9) <c. 

Example 2. An example from genetics will illustrate this procedure. Mather (1951) quotes 
the following data of Philp (1934) on the joint segregation of the two factors p and ¢ in the 
poppy. In a backcross progeny Philp observed the frequencies given in Table 3. The three 
components of a chi-square analysis are given in Table 4, together with the simultaneous 
confidence intervals on the distances from the null hypotheses that : (1) the P, p segregation 
is 1:1; (2) the 7',t segregation is 1:1; and (3) the two segregations are independent. The 
confidence intervals have been obtained by taking c = 7-815, the upper 5% point of the 
chi-square distribution with 3 degrees of freedom. We can conclude that the average 
(i.e. root mean square) proportional discrepancies of the P, p and the 7’, ¢ segregations from 
their theoretical values are at most 15-3 and 15-7 % respectively, whereas the proportional 
discrepancy from the hypothesis of independence is at least 55-8 %. 























Table 3 
PpTt Pptt ppTt | pptt Total 
Observed 191 37 36 | 203 467 
Expected (with 116-75 116-75 116-75 | 116-75 467 
no linkage) 
| 





























Table 4 
Confidence interval 
2 
Item Xx D.F. on (3,/n)i 
Segregation for P, p 0-259 1 0-0-153 
Segregation for 7’, t 0-362 1 0-0-157 
Joint segregation 220-645 1 0-558-0-817 











4, AN APPROXIMATE CONFIDENCE INTERVAL 


The simultaneous confidence interval just considered is of course only appropriate when we 
want to place limits on the distances of the true hypothesis from several null hypotheses, or 
when we may want simultaneous limits on the individual parameters as well as on the 
distance. If we only want limits on the distance of the true hypothesis from one null hypo- 
thesis, then a much narrower interval can be found. The purpose of this section is to con- 
struct such an interval approximately. The criterion used in the construction is that the 
likelihood ratios of the two extremes of the interval should be the same; these extremes do 
not cut off equal tail areas and are thus not, separately, confidence limits. We must therefore 
first determine what the likelihood ratio is. 
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It is shown by Kendall (1946, p. 300) that the logarithm of the likelihood ratio for a par- 
ticular hypothesis specifying the value of y, is 


— plog (14 SHY), (22) 

2 

If we substitute in this expression the value of y, which minimizes S,(y,) subject to the 
dition ’ 

“aie (¥1— Yo)’ Bevs— Yo) = 6, (23) 


we shall have the logarithm of the likelihood ratio for 6. It can be shown by a method similar 
to the proof of the theorem of § 3 that the minimum value of S,(y,) subject to this condition is 


(S? —42)2, (24) 


where, as before, S, is written for S,(y,). The likelihood ratio for 6 is thus a decreasing 
function of 


(St— 44 25 
uM, eS (25) 
We shall therefore try to find a confidence interval for d by asserting that 
$_ sdye 
ean (26) 
M, 


where F = rS,/M, and g is a function chosen to make the probability of the truth of the 
statement as nearly as possible 1 — «. This is equivalent to asserting that 5? lies between the 


— st + Ug (FYE. (27) 


It should be noted that the distribution of the expression (25) depends on the non-centrality 
parameter, A, but on nothing else; this is why g must be a function of F, which provides 

information about A. 
Before constructing a function, g, we shall try to find an approximation to the distribution 
of (25). Consider 
y= >: (28) 


When A = 0, y is distributed as a chi-square variate with r degrees of freedom. When 
A + 00, y tends to a chi-square variate with 1 degree of freedom, since S? tends to a normal 
variate with mean 6? and variance o? (Patnaik, 1949). For intermediate values of A, it 
seems reasonable to approximate ay by a chi-square variate with f degrees of freedom, where 
a and f are chosen to make the mean and variance of ay and of a chi-square variate with 
f degrees of freedom the same. If this approximation is satisfactory, then the expression (25) 
multiplied by a/f can be approximated by an F-distribution with f and (n—k) degrees of 
freedom. 

The mean and variance of y can easily be found from Patnaik’s (1949) formulae for the 
mean and variance of S}. Writing rx \t 

t=-(-4). (29) 
ha 

we find E(y) = CF) r+4t(1+#)+O(r-), 
(30) 





V(y) = [200 +00 = r+[t(1 —t) (1—@) (1+ 30) + $21 +2)2]+ O(r). 
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ay can thus be approximated by a chi-square variate with f degrees of freedom where 


_ 2E(y) _ 2B*y) 
= Vy? F=Vy)" (31) 





If we knew A, we could use this approximation to determine g as 


where U(f,n—k) is the upper 100x % point of the F-distribution with f and n —k degrees of 
freedom (for known A, g(F’) would of course be independent of F’). However, we do not know 
A and so it must be estimated in some way from the observed value of F. It is proposed to 
calculate A,, the lower 100(1—«) % limit of A given F’, and to determine g(F’) from equa- 
tion (32) as if this were the known value of A. This procedure ensures that the probability of 
the confidence interval covering the true value of 6 is exactly 1—a when 6 = 0; the prob- 
ability also tends to 1—« when A > ©. 
In principle, an exact lower confidence limit for A can be found from the observed value of 
F by solving the equation 7 
I, p(x, A)dx = l-a (33) 
for A, where p(a,A) is the density function of the non-central F-distribution with non- 
centrality parameter A. In practice, however, p is too complicated to evaluate, and so 
Patnaik’s (1949) approximation will be used, that ¢?/ can be regarded as a central F-variate 
with v and n—k degrees of freedom where 


(r+A)? 


p= ~ 
r+2r 





(34) 


and ¢ is defined in equation (29). Thus in order to find A, the lower limit for A, we must solve 
the equation eF = U(r,n—k) (35) 


in terms of A. This is equivalent to 


PF 
=| goerH—4 a (36) 


This is an implicit equation since r is a function of A, but an iterative procedure converges 
fairly quickly. If F < U(r,n—k), A; is to be taken as zero. 

The calculation of the approximate confidence interval for d is not so complicated as the 
above development might suggest and can be performed in four steps: 

(1) Find A, by solving (36) for A. 

(2) Find E(y), V(y) and f from (30) and (31), using A; instead of A to calculate t. 

(3) Find g(F) from (32). 

(4) Assert that 54 lies between the limits given in (27). 

Example 3. Let us find a confidence interval for the data of Example 1 by this method. 

(1) We must first solve (36) by iteration. Starting from r(#—1) = 5-58, which is 
an unbiased estimate of A, we find v = 7-58?/13-16 = 4-37; U(4-37, 139) = 2:38 and 
A = (3-79/2-38—1) x 2 = 1-18. Substituting this value, we find vy = 2-32, U(2-32, 139) = 2-93 
and A = 0-587. Reiterating, vy = 2-11, U = 3-02 and A = 0-538. Reiterating, v = 2-094, 
U = 3-02 and A = 0-538. This is the solution of (36). 
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(2) Substituting this figure in (29), ¢ = 0-212 and ¢ = 0-460. Hence H(y) = 1-018, 
V(y) = 2-269 and f = 0-913. 

(3) Evaluating U(0-913, 139) by the method of the next section we find that it is 4-05 and 
so g(F) = 4-05 x 1-018 = 412. 

(4) M,g(F) = 4-12 x 31-5 = 129-3. Thus the limits for 64 are 15-45 + 11-39 and the limits 
for (5/n)? are 0-34—2-25 mm. 


5. PERCENTAGE POINTS OF / WITH FRACTIONAL DEGREES OF FREEDOM 


In the method developed in the previous section, it is necessary to know U(f;,, f.), the upper 
percentage point of the F distribution with f, and f, degrees of freedom, where /, is usually 
fractional. When /, is less than 2, interpolation in the ordinary tables is no longer adequate. 
I have therefore tabulated U(f,,00) for f, = 0-2 (0-2) 2-2, at the 1 and 5% points. Linear 
interpolation in this table is quite adequate and, provided that f, is an integer greater than 6, 
U(f;,f2) can be found from the interpolation formula: 


— u(2, f,) + Uf 0)— U2, o)] _U(: 7 
U(fi,f2) = U(2,f2) + rod, 00) — U(2, 00)] [U(1, fe) — U(2,fo)]. (37) 





The accuracy of this formula has been checked by obtaining U(2,f,) from the values of 
U(1, fo), U(3, fo), U(1, 00), U(3, 00) and U(2, co) substituted in the corresponding interpola- 
tion formula. Even for f, as low as 6 these values were within 0-1 % of the correct value. 


Table 5. Upper 1% and 5% points of F for f, = 0-2 (0-2) 2-2 and f, = 00. 























| 
fi | 1% 5% fi 1% 5% 

| pay Tent pare!  . =i 

| 
0-2 | 15-885 5-8044 1-4 5-5368 3-4037 
0-4 | 11-012 | 61526 1-6 5-1623 3-2439 
06 | 87980 | 45745 | 1:8 4-8581 3-1100 
0-8 | 75002 | 41549 20 | 46052 2-9957 
10 | 66349 | 38415 | 22 43911 2-8970 
12 | 60110 | 35983 | 











6. THE ACCURACY OF THE APPROXIMATION 


The probability of the interval developed in § 4 covering the true value of 3 is exactly 1—« 
when A = 0 or when A -> co. Some values of this probability, P, for intermediate values of A 
have been worked out by direct numerical integration and are given in Table 6. 

The problem is to evaluate 


gt — 3h)2 1 st—o9ty 
P=Pr er } <9(F)| = Pr |' “zi, <i(P)), (38) 





where h(F) = g(F)/(rF). Now his a strictly decreasing function of its argument and so the 
inverse function, /—1, exists. Thus, if we define 


2 = (1—4/[8/S,)) (39) 


. » S. 
we can write P = Pr[h-(z) < F] = Pr [a< al . (40) 




















368 Confidence intervals for distance in the analysis of variance 


This quantity can be evaluated by direct numerical integration, using Pearson’s (1922) 
tables of the incomplete gamma function; the density function of S, can be expressed as 
a simple combination of elementary functions when r is odd and as an infinite sum of Poisson 
functions when r is even (see Fisher, 1928; Patnaik, 1949). When r is small these expressions 
can be quite easily evaluated. 


Table 6. The exact probability level of the confidence interval when the nominal level is 0-95 


r is the number of degrees of freedom for ‘treatments’, n—k for error. 


























| | A 
| # | s—& 
| | | 0 4 | 16 64 256 oO 
| 
rtd: | | - : RD, Salty okies, 
| 2 12 | 0-950 | 0957 | 0-956 0-951 0-950 0-950 
| 2 | | 0-950 0-956 0-957 0-951 0-950 0-950 
| 2 | ow | 0-950 0-955 | 0-959 0-951 0-950 0-950 
| | 
| | | 
|} 4 | 12 | 0-950 0-954 0-956 0-951 0-950 0-950 
4 | 24 | 0-950 0-951 0-955 0-950 0-950 0-950 
4 | © | 0-950 0-949 0-953 0-950 0-950 0-950 
| 
| 7 | 12 | 0-950 0-952 | 0-954 | 0-955 0-952 0-950 
| 7 | 24 | 0-950 0-952 | 0952 | 0-954 0-951 0-950 
7 | co | 0-950 0-950 | 0-949 0-953 0-950 0-950 
| 
| Re 








| 
6 








Fig. 1. The 95% approximate confidence interval (broken) and the upper and lower 97} % confidence 
limits (continuous) for At when r= 4, n—k= co. The long and short dotted line is the maximum 
likelihood estimate. It is assumed that o?=1. 
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It can be seen from the results in Table 6 that the interval is usually a little too conserva- 
tive, that is, when the nominal level is 95 %, the actual level is usually slightly greater than 
95%. However, it never exceeds 96 % and is usually between 95 and 95} %; this is probably 
sufficiently accurate for most practical purposes. The evaluation of the density function of 
the non-central chi-square distribution becomes rather tedious when r is greater than 7 and 
the table has not been extended beyond this point. It can be plausibly conjectured, however, 
that similar results would be obtained if it were extended. 

As was shown in § 4, exact confidence limits can be placed on the non-ceatrality para- 
meter, A. It is interesting to see how these limits compare with the interval proposed in this 
paper when the number of degrees of freedom for error is infinite (when of course the problems 
of setting limits on 6 and A are the same since a? is known). The 95 % interval and the upper 
and lower 97} % limits are shown in Fig. 1 for the case r = 4. It will be seen that the limits 
of the interval are always greater than the corresponding two-sided limits and also that, for 
small S,, the interval is wider than the distance between the two-sided limits. The curious 
kink in the upper limit of the interval just above the upper 5 % point of S, on the null hypo- 
thesis is due to the extreme rapidity with which the effective number of degrees of freedom 
of y, defined by (28), decreases near A = 0. Despite these facts, it seems to me that the 
confidence interval is to be preferred to the two-sided limit, since the upper limit of the 
latter is, for small S,, less than the maximum likelihood estimate of A; this seems to me to be 
most undesirable. 


I am indebted to Mr A. M. Walker for many valuable discussions. 
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THE EFFICIENCIES OF ALTERNATIVE ESTIMATORS FOR 
AN ASYMPTOTIC REGRESSION EQUATION 


By D. J. FINNEY 
University of Aberdeen and A.R.C. Unit of Statistics 


SUMMARY 


Methods for the estimation of the parameter p in equation (1) have been discussed, this 
equation representing the expectation of observations on a quantity y for a specified value 
of x. The methods all involve taking as the estimator the ratio of either two linear functions 
or two quadratic functions of the observed y. Their relative efficiencies and biases are con- 
sidered under two models for the generation of observations, the y either being independent 
and normally distributed about their expectations with constant variance or arising from 
a continuous autoregressive process in which the variance increases and successive values 
of y for an individual are correlated. 

Detailed investigation has been possible only for the very simple situation of four obser- 
vations equally spaced in x. As is well known, the Patterson estimator, a ratio of two linear 
functions, is of high asymptotic efficiency for the first model, and it proves to be also highly 
efficient for the second. An interesting alternative is the calculation of a linear regression 
of y;,, on y;. This is also highly efficient, and has the advantage of simultaneously estimating 
the parameter ~; moreover, under the second model, the estimators of p and « maximize 
the likelihood for any number of equally spaced observations. However, the estimator of 
p appears likely to have a considerable negative bias if the variance of observations about 
their expectation is at all large. Other quadratic estimators have been examined, but showed 
no special merits. 

The indications are that the Patterson estimator is always a fairly safe one to use, for 
any number of equally spaced observations; its efficiency is never low, and it is unlikely to 
be seriously biased. The estimator calculated autoregressively is likely to be more efficient 
(in the narrow sense of having a smaller asymptotic variance), especially for the second 
model and for a larger number of observations, and if the variance per observation is low 
this advantage will not be offset by its greater expectation of bias. 


1. INTRODUCTION 


rhe regression equation Ely) = Y = a—for, (1) 
expressing the dependence of y on an independent variate x in terms of parameters a, /, p 
has many uses in biology and biometry. For example, it has often been used to give an 
approximate graduation of the yield of a crop receiving an amount « of fertilizer (Crowther 
& Yates, 1941; Hodnett, 1956); again, during some phases of growth, the relation between a 
measure of size of an animal and time may be approximately of this form. The parameter 
p must satisfy 0<p<1. In practical applications, « and £ are usually positive with # <<a, 
but this is neither essential nor a limitation on the discussion that follows. It is worth noting 
that a change in the origin of x simply changes the value of /. 
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In estimating the parameters of equation (1) from observations, p presents the chief 
problem. If p were known, a and / would be estimated from a linear regression of y on p*; 
even if only an estimate of p is available, this calculation is a reasonable method of obtaining 
a and # though in general not the most efficient. For observations entirely unrestricted in 
respect of x, the estimation of all parameters simultaneously, or even of p alone, is neces- 
sarily laborious. A number of useful suggestions have been made, however, for the important 
special case of equal numbers of observations at each of several equally spaced values of x, 
with equal variance for all observations. Stevens (1951) showed a satisfactory routine for 
constructing and solving the maximum likelihood equations, and provided tables for 
assisting this process. Pimentel Gomes (1953) developed this technique further. Patterson 
(1956) proposed an ingenious and simple method of estimating p as the ratio of two linear 
functions, and showed it to be highly efficient at least for a moderate number of different 
values of x. Hartley (1948) suggested a very different procedure that he termed ‘internal 
regression’. The primary purpose of this paper is to study the relations between the Patter- 
son method and alternatives related to internal regression, and to discuss their relative 
efficiencies. Two modes of generation of error are also considered. Complexity of the algebra 
restricts the detailed discussion to the simplest case, namely, that of four observations 
equally spaced in respect of x. 


2. THE CLASS OF ESTIMATORS 


For equally spaced observations, the values of x may be taken as 7 = 1, 2,3, ...,n; the data 
for analysis will be equal numbers of replicate single observations at each value of 7. The 
mean of the observations at a particular i, denoted by y;, will in general deviate from the 
expectation given by equation (1). In this paper, two models for the variance of y; will be 
considered. For both models, the variance for a particular i is supposed to be inversely 
proportional to the number of replicates, but one has a variance that is constant for all 7 
and the other a variance that increases as 7 increases. Thus, in respect of replication for the 
same set of x values, each y; is a consistent estimator of Y;, where 


Y, = a—fp'. (2) 

All estimators of p to be discussed are ratios of two functions of the y;. Each is of the form 

r=, (3) 

where A and B, functions of the y,, are consistent estimators of two functions of p whose 

ratio is p; that is to say, E(A)=£, E(B) =7, (4) 
where, as replication at each value of 7 increases, 

E>£o >No (5) 

and p = £o/No- (6) 


For some of the estimators considered, A and B are unbiased estimators of &) and 79, so that 
£,7 are equal to £5, 7, respectively, whatever the replication; more generally, they contain 
terms involving the variances of the y; that tend to zero as the replication increases. Although 
ris a consistent estimator of p, it is not in general unbiased even when & = £9, 7 = Mo. 

The estimators that form the subject of this paper have A and B in the form of linear or 
quadratic functions of the y;, and will be referred to as ‘linear’ or ‘quadratic estimators’. 
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Thi 
3. EXPECTATION AND VARIANCE OF A RATIO 
a 
Write R=E/n, o| = 
and define the bivariate moments of A and B by dev 
vy = E{(A—£)*(B—7)}.  * 
sii ae -1 
Then, by writing r= R(1 rad (: +=) ; like 
expanding in series and taking expectations, an asymptotic expression for the expectation 
of r is easily obtained 
1 —Rvo2 . Vr2—RU3 Vig — Rv wh 
E(r) = R- ee ee (9) 
An alternative expansion in terms of p begins 
= a Rx = a en 2 Fe 
ti wae (§—£0)—P(n—%0) _ (n= 0) ((E bo) P(N—%o)}_ Yuh (10) a 
No vi) 
Equation (9) displays the bias in r regarded as an estimator of R, equation (10) the bias in 
r regarded as an estimator of p. 
A similar procedure leads to an asymptotic expansion for the variance of r: 
V(r) = Efr—E(r)}? wl 
1 2 
=  (Y29 — 2Rvy, + Rv 99) — | (Va, — 2RVyy + RV) qT 
7] 7 in 
1 
+ 74 {(8re— rh) — 2R(3v43 — P11 pa) + R*(3v 94 — VG2)} + «+++ (11) } 
The first term of this is the expression usually quoted as the asymptotic variance for a ratio, 
namely V(r) = V(A—BB)/n?, (12) ; 
‘ 
and to the first order R may be replaced by p. I 
If A and B are normally distributed quantities, all odd moments vanish ; ‘ 
vy= 90 if (s+t) is odd Be... 
and Veg = Vop%o2t+ 2051, 3 = 31% 2, Yoa = 3YGo- 
_ 3 
Hence E(r) = R- 11" Rog (: + —y a) (9’) 
UY] U7] 
1 1 ' 
and V(r) = = (bp — 2Rry, + BPv9a) +5 (Beap%o2 + 50h — LORY, Yon + SRG) +... (11!) | T 
These expressions agree with those obtained by Merrill (1928). 
V 
4. CONSTANT VARIANCE MODEL 
The simplest model that may be postulated for the errors to which the y; are subject is that 
of each y,; being normally distributed about Y; with constant variance and zero correlation | 
between observations. That is to say 1 
EX{(y;—Y;) (y; —Y;)} =o if i = (13) P 
=0 if ++). 
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This is likely to be a reasonable approximation in some circumstances, notably when the 
y, are mean yields of a crop in a field experiment to compare different levels, i, of fertilizer 
application. It is the model discussed by Stevens (1951) and Pimentel Gomes (1953), who 
developed the maximum likelihood estimation equations. According to Patterson (1956), 
Lipton has extended Stevens’s tables so as to cover the range 3<n< 12. 

For the particular case of n = 4 important to this paper, the variance of the maximum 
likelihood estimator of p is 





_ _ # __(3+4p + 3p*) (1 +p?) 
V(r) ~ p*(1—p)? 2(1 + 2p + 4p? + 2p3 + pi)’ (14) 
where $2 = 0/2. (15) 


5. PATTERSON’S LINEAR ESTIMATOR 


For comparison with subsequent sections, it is necessary to summarize here the results 
that Patterson obtained for the general linear estimator 


— Yn tHeYn-at ++» tHn-1Y2 (16) 
HaYn—1 t+ HeYn—2t ++ Mn 





n—1 
where = 4% = 9. 
1 
This has the general form described in § 2, with expectations of numerator and denominator 


independent of o?, and therefore £ = & , 7 = 7. From equation (11) or (11’) 


PME + (Me — Phy)? + (Mg — Pa)? + +» +P? Maa 
V(rp) = , 17 
( 'p) p? (4,p"~? + pop” + ee + fln—1)* ( ) 





Values of the , for minimizing V(rp) can readily be obtained, though the general solution 
does not appear to be expressible very simply. Patterson has proved that the minimizing 
#;, make rp equal to 7, the maximum likelihood estimator of p, a result which is of limited 
value because the 1; are themselves functions of p and therefore an iterative calculation is 
required. 

For n = 4, the estimator may be written 


Yat (A—1)y3—AYe 
Vp ae ee 18 
F Y3t+(A—1)y¥2-Ayy (18) 
Then E(A) = fp*(1—p)(p +A), 


E(B) = fp(1—p)(p+A), 
which have the properties already mentioned. Also, by equation (11’) or (17) 
2g? (1+A*)(1+p+p?)—A(1 +p)? 


6) = oe (9 

(rp) = 531 — py (A+p)? 
2 3 

This is minimized by = eee ” (20) 


and substitution of this in equation (19) gives the same expression as in (14), in accordance 
with the theorem due to Patterson stated above. 


24 Biom. 45 
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Patterson observed that use of A = 1-25 preserves a high efficiency of rp whatever the 

value of p; this value is fully efficient at p = 0-357. Equations (18), (19) give 
4Y4+Y3— 5Ye 
nse Me ee 21 
Pi0 08  4Y3 + Yo— 5Yy (21) 

-_% _(21+p +219?) ' 
and V (rp, 1.25) 7 pl —p)? (5 + 4p) (22) 
He tabulated the efficiency of this estimator. He also mentioned that, if p is large (say 
p> 0-6), the estimator with A = 1 has a smaller variance 





Ysa—Yo 


# aes > 23 
Pim iy (23) 
ge 2(1+p") 

V > eS ae ed 3 2 

ea) = 5a pp (1+? ee 
similarly, if p is small (say p< 0-15), A = 2 is preferable 
+ Y3—2 

tps= Yat Y3— “Y2 (25) 


Y3tYo—2yy" 
g? 2(34+p+ 3p?) 
Mea) = pape Bp a 

If the efficiency of these estimators were not high enough, the maximum likelihood 
estimator could be obtained by substitution of rp for p in equation (20) followed by iteration 
on this equation and equation (18). In practice, one cycle of iteration would almost certainly 
be adequate. 

Although in practice the biases in these estimators are not likely to be important, and 
their consistency ensures that they approach the population value, p, as the replication is 
increased, there is some interest in employing equation (10) to compare the magnitudes 
of the biases to the order of the term in ¢?. For tie general linear estimator 








POE AR dE tb 
— P* pH — pp (A+p)? (27) 
With the three particular constant values of A that have been used in this section, (27) 
becomes 

1442 
E(rp, 3.25) = p+ ‘ a A tn (28) 

; p>(1—p)? (5+ 4p) 

¢° 2p 

> = a. ee os . “i 
E(rp,1) P+ 1—p)®?( +p) (29) 

y ¢* 1+ 6p 
Mere.) = P+ pe peep 30 
(rp, 2) P+ a1 =p? Q@+p) (30) 


It is worth noting that, although r, , is more precise than rp .., when p is very small, the 
bias is relatively greater (Table 2). 


6. TAYLOR’S ESTIMATOR 
Dr St C. S. Taylor has suggested to me that a very simple procedure for estimating p is to 
calculate a regression coefficient of y;,, on y; (¢ = 1, 2,...,2—1) by the ordinary formulae 
of unweighted linear regression, neglecting any complications resulting from the occurrence 
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of y; both as the ‘independent’ and as the ‘dependent’ variate. This is suggested by a 
recurrence relation derived from equation (2) 


Ysa = a(1—p)+p¥,, (31) 


which, without implying any optimal properties, gives some hope that useful estimates of 
a and of p might be obtained from the regression equation. 

Evidently the numerator and denominator of the usual expression for the regression 
coefficient of y;,, on y; are quadratic functions of the y;. Moreover, since 


E(y;y;)= Y2+0? if i=j, 
(Y:¥;) aii ; } (32) 
= Y,Y; if ++), 
the expectations of numerator ana denominator are of the form specified in § 2. 
When n = 4, the estimator of p becomes 
= 2YsYs— YaYa— Yaa — YE +YaYa— Yots — Vit Wah 33 
rr = 2 2 = oe 2 ? ( ) 
2(Y3 — YsY2—YsY1 + Y2—Y241 + Yi) 
for which it is easily verified that 
E(A) = 2f?p3(1 —p)? (1 +p +p?) — a (34) 
B(B) = 261 —p)?(1 +p +p?) + 60°. 


Evaluation of the first-order term in the asymptotic variance of rp, from equation (11), is 
a tedious piece of algebra. As both numerator and denominator are independent of a, the 
transformation a (35) 
enables 7, to be written as exactly the same function of the z;. Moreover, from elementary 
properties of the normal distribution, variances and covariances of quadratic functions of 
the z; can be evaluated as 
V (22) = 4f2o%p% + 204, 
V(z,2;) = B?o?(p™ + p®!) +04, 
O (22, 2,2) = 2Brorpi, 
C (23, 23) 0, (36) 
C(z4, 2;2,) = 0, 


Cz; 25s 22) = Borpitk, 





Cz, 29,2324) = 0, + ete., 
for i+j +k. 
In forming equation (12), only the term of order ¢? will be considered. By use of (36), 
this is found to be 
¢? 344p+4p?+ 4p3+ 34 
fa Bi. A. tl (37) 
2(1+p+p*)? 





Mra) = 5 pp 


an expression very similar to (14), but slightly greater except at the limits of the range of p. 
By (10), the expectation of r,, can be found as 


. 1+4p+p? 
Hes =p ete (38) 





p(1—p)?2(1+p+p*)*” 


24-2 
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7. HARTLEY ESTIMATORS 


In a paper in which he introduced a method that he described as ‘internal least squares’, 
Hartley (1948) appeared to suggest a method of estimation analogous to Taylor’s but 
instead using a regression of (y;,,—y;) on (Y;,,+Y;,)- In fact, he goes on to advocate some- 
thing rather more complicated, but before this is discussed the simpler alternative is worth 
examination. From equation (2) 

Vesa ~Y = 20 oP Win $V). (30) 
Hence, computation of a regression equation of (y;,, — y;) on (Y;4, + y;) Should give estimates 
of « and p, and, a priori, this method of estimation might be expected to be similar in 
character to Taylor’s. 

The numerator and denominator of the regression coefficient, formed uncritically in the 
ordinary manner, are quadratic functions of the y;; the ratio of the difference between these 
quantities to their sum is an estimator of p in the sense of § 2. The properties of the estimator 
can be studied exactly as in §6. For n = 4, the estimator is 





_ 2Y§— 8Y4Yo—YaYi + Y3—YsYo—YsYi t+ Y3t 221 40 
Th = 2 ja es 2 — By." 249 2° ( ) 
Yas — YsY2— Yai t Y3— YsYo2— PY341 + Y2t “Yi 
Then E(A) = 26793(1—p)? (1+ p)(1+p+p?)+ .- (41) 
E(B) = 26?p?(1—p)? (1+p)(1+p+p*)+ 40°. 


By the use of (35), (36), the variance is evaluated as 


¢? 3+4p+4p?+ 4p + 3p4 
ip? 2(1+p+p?)? 





, (42) 


V(r;) i p>( 


exactly the same as for rp. 
The estimate is not identical with r,, and it has a bias of opposite sign; by (10), the 
expectation of r, can be evaluated as 


g? 2+p+7p?+ 5p*+ 34 

Mra) = P+ oa — py 21 +p) (1 tp +p)? an 
The method that Hartley in fact advocated was not based so directly on (39), but involved 
a more complicated regression of the y; on partial sums of y;. I have been unable to under- 
stand what particular advantage can be claimed for this procedure on the evidence that 
Hartley supplies. Viewed as a regression calculation, it is considerably more laborious than 
either of those considered above, because of the various partial sums that must be formed. 
So far as its precision is concerned, the construction by way of a regression is irrelevant, 
because it takes no account of the pattern of the errors; the estimator is in reality just one 
more ratio of two quadratics, and its merit can be judged only by starting from this point. 

For n = 4, Hartley’s formulae can be arranged to give 


= 3Ya+ Yas — 5YaYa— Yat + ¥5— 2YaYa— Yad + 2Y5+ BYaIs (44) 








i all , 
s 3Y4Y3 — YaYo— 2441 + 2Y§ — 2Y3Yo— 5Y3Y1 + YF + YoY + 3Y7 
For this ratio 


EA) = f%p*(1— p)? (1 + p) (3+ 4p + 3p?) + 60?,) 


45 
E(B) = f*p*(1—p)? (1 +p) (3 + 4p + 3p") + 60°. J - 
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The variance is 
@? = 2(7+12p + 12p? + 12p8 + 7p*) 


Mrn) = 50 pp (3+ 4p + 3p?)? 


(46) 





The expectation of the estimator is 


g? 2(3+4p + 20p? + 16p* + 7p*) 
E(rz) = p+ = — —— So 47 
(") P+ 31—p) (1+p)(3+4p + 3p?) me 


8. GENERAL QUADRATIC ESTIMATOR 


The analysis of the preceding two sections suggests that the most general quadratic estimator 


of the form 
Te = LPG YiYs]| DUG ViY; (48) 
U9 %,9 


that falls within the class of estimators of § 2 might usefully be studied. The conditions that 
the expectations of numerator and denominator of r shall be in the ratio p, except for 
multiples of a? in each, place a number of linear constraints on the coefficients that enable 
them all to be expressed in terms of five arbitrary quantities. These expressions have not 
been derived. By imposing the additional conditions that 


ULPn=9 YI = 9, 
v v 
the terms in ¢? in numerator and denominator are eliminated, and therefore 
p = E(A)/E(B). (49) 


Under these more stringent conditions, only three arbitrary quantities remain; the coeffi- 
cients in (48) may be written 


Coefficient of Numerator Denominator 

yt 0 py 

YiYe 2h, — 4, + 2po 

YiYs — 3+ 2pg— fy t+2My 2p, — 3flg+2s— fy 
Y1Ya fy —2lg+ fg 2h — 2+ fe—2gt Ms 
ya — fy + f3— 2M, — pe + fs 
Y2¥3 fy —2gt+ y+ 2g 2y+ fea 2gt+ My 
Y2Ys — fy t+ 2py—3pg+ 2pg 2fly— fat 2plg—3py 
Y5 My — Ms —2p,+ fs — Ms 
Ysa 2fl3— 4 fg 2M 
yi aT 0 


where the ratios 1 : Wg: 4g: 44 are arbitrary. Then 


E(A) = 2p°(1—p)? (44+ Map + Msp” ope (50) 
E(B) = 2p?(1—p)? (4, + Map + Usp? + Map’): 
Heavy algebra leads to the asymptotic variance formula 
2 2 Cig Mi fy 
V(rg) = $4 (51) 





p*(1 =p)? 2(uy + Hap + Map? + gp?)”” 
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where Cy, = 7+ 22p + 42p? + 4293 + 3304 + 12p° + 49%, 
Cio = — (44+ 139 + 20p? + 20p? + 1494 + 5p® + 2°), 
Cy3 = 2(1+ 49+ 8p? + 11p? + 10p* + 5p® + 2p), 
Cyq = — 2(2+ 9p + 17p? + 23p3 + 17p* + 9p* + 2p%), 
Cop = 4+ 80+ Llp? + 10p? + 6p* + 2p* + p', 
Cog = — 2(1 + 2p + 4p? + 5p® + 4p* + 2p? + p'), 
Coq = 2(2+ 5p + 10p? + 11p* + 8p* + 4p° + p), 
1+ 2p + 6p? + 10p? + 1l1p*+ 8p* + 49%, 
Cyq = —(2+ 5p + 14p? + 20p3 + 20p* + 13p° + 49°), 
Cyq = 44+ 129+ 33p? + 4293 + 4204 + 2205 + 7p8. 


° 

~ 
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The next step might appear to be the minimization of this variance by suitable choice of 
the ;. Not only would this involve very laborious algebra, but the result, the ~; expressed 
as complicated functions of p, would be of limited practical interest. An alternative is first 
to try to constrain the 1, so that (51) agrees with the known minimal values as p approaches 
0 or 1, namely 3¢2 # 


—_ and ——*—.._ respectively, 
2p?(1 —p)? p?(1—p)? , . 


as may be seen from (14). This is achieved by taking 
Hg = 9, fs = — 2(y— 4). 
Minimization of (51) is now readily shown to require 


4 4+994+ 1892+ 17p? + 1204 + 4p5 
Ma _ Pwr lop + lip p P 


Hy 1+4p+7p2+10p3 + 6p4+ 4p% ’ 





(52) 


the value of which declines monotonically from 4 at p = 0 to 2 at p = 1. At p = 0-5, (52) 
becomes 1314 = 32), 

and evidently the value is nearly constant when p is in the upper part of its range. Hence, 
with the condition of being optimal at the extremes of p, the simple values 


fy =1, fe=90, Pg=2, Wy =2 (53) 
seem likely to be fairly good throughout the range. 
These give 
ry = A= Aaa — 3YaYa— Yas — YE + TYsYe— Yad — SVE + Yad (54) 


4Y 4 Ys — 444s — 495 + 2Y341 + 2y3—4Y 041 + 2y7 


The expectations of numerator and denominator can be read by inserting (53) in (50), 
and from (51) 
‘ g? 3-—2p+14p?— 6p? + 29pt— 4p*° + 16p® 
Vira) = 2 ~~ 9(1 + 292 4+ 2p3)2 mons 
p*(1—p) 2(1 + 2p* + 2p*) 





(55) 


The estimator 7, is certainly biased, but the magnitude of the bias has not been examined. 

Unfortunately, neither the general form of the quadratic estimator for n = 4, nor the 
study of optimal conditions, discloses any hint of appropriate further generalizations to 
higher values of n. 
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9. INCREASING VARIANCE MODEL 


When equation (1) represents a biological growth curve, x is a measure of time, and it may 
be appropriate to incorporate into the model a variance that changes with time. One way 
of doing this is by means of a continuous autoregressive scheme in which the expected rate 
of growth at any time is given by a differential equation corresponding to (1), but in each 
element of time individuals are subject to an error distribution. This will produce a corre- 
lation between successive observations in respect of their deviations from the average 
curve as well as a steadily increasing total variance. The theory that follows in this section 
has analogies with the theory of Brownian movement, based upon Langevin’s equation 
(Chandrasekhar, 1943). 


It is useful to write logp =—y (56) 

and p™ =u. (57) 
. oY . 

Then, from equation (1), + y(a—Y). (58) 


Define K,(0,x) as the cumulant generating function of y for a particular value of the 
independent variate, x. Define also L(0,x) dz as the cumulant generating function for the 
distribution of the additional ‘error’ acquired by an individual in the time interval (x,x + dz). 

Express the condition that the cumulant generating function at time (x + dz) is the sum 
of the functions at x for the variate (y+dy) and for the error, or 











K,(9,a+dx) = Ky, a,(9,x) + L(0, x) dx (59) 

where dy = y(a—y) dx. 
OK ou 
Then K,(9, x) +d Ou Ox = Aya da+yd—y ax9, x) + L(G, «) da 
dz ,. m ; oK 
= ae: 5 (10) + K, (0, x) —y(00) dx (G0) + L(0, x) dx. 
Hence, in the limit, - 
weed a +a(10)+y1L(0, x) = 0. (60) 


“Fu ~ )5G6) 
Moreover, if time is measured from the point at which the error begins to take effect, this 
differential equation is subject to the end condition 
K,(0, 0) = («—f) (id). (61) 


(i) Suppose now that the error increment is normally distributed and a constant in- 
d ; that i 7 : 
ependent of y; that is to say (0,2) = 40%(i0)?. (62) 


The solution of (60) subject to (61) is, as might almost be guessed, 


2 ;0)2 
K, (0,2) = (au) (i0) +5 (1—w*) ee. (63) 


‘ 


Thus y is normally distributed, with expectation still given by equation (1) but with 


variance now expressed by 
Vy) = 5-(1—-p™), (64) 


and tending to the limit o?/2y as x becomes large. 
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(ii) A second possibility is to ‘ave a normally distributed error increment whose magni- 


tude is proportional to the expectation of y. If «,(w) is written for the sth cumulant, then 


L(@,x) = 407K, (u) (10). (65) 
Equating coefficients of 10 in (60) then gives 
Uk,(u)—K,(u)+a = 0, 
UK3(U) — 2K,(u) + (o?/y) K,(u) = 0, 
uk,(u)—sk,(u)=0 for s>3 


whence, in virtue of the end condition (61), 


P o =— 
K,(O, x) = (a— fu) ei dhe athe u) {a+ (a — 28) us “S. (66) 
Thus the expectation of y is the same, but now the variance is 
Vy) = ye ) {a + (aw — 28) p*}. (67) 


(iii) The hypothesis that the increment in variance is proportional to the square of the 
expectation of y does not lead to a solution. 

(iv) Yet one more possibility worth mentioning is that of a normally distributed variance 
increment that decreases in proportion to the amount by which H(y) falls belowits asymptote. 


This requires L(O, x) = 40*{a —K,(u)} (0)? (68) 


which gives rise to K,(, x) — Bu) (i) Ph = bul l—u — ; (69) 

Only the first of these models will be used further in this paper. Equation (64) for the 

variance can easily be generalized to show the covariance of the y values at times 2, 2, 
which is o 

C(y,; Y2) = <i, —p*%), where 2%>2. (70) 

If observations are made at unit intervals of x, a transformation of values of y greatly 


simplifies the algebra needed in the study of the estimators of p. Suppose that at times 
X,X+1,X+2,... the observations are y;, Yo, Ys, .... Write 


iShig a (i) 
2 =Y;—PY¥i-1 for i>1. 
Then it is easily verified that 
E(z,) = «—fpx, 
(2) = «—fp } (72) 
E(z;)=a(l-—p) for i>1 
and also that o 
Vie) = 5 (1-p*), | 
Y 
(73) 


o* 4 
V(z,;) = _ for i> q 


O(z;,2;)=90 for j>i>0. 
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10. PATTERSON’S LINEAR ESTIMATOR 


Now consider once more the general linear estimator of p, equation (16), for a set of n 
observations made at times X, X +1,...,X +n—1. Then 


A — pB = fy2nt Meena t ++» + hn 


Write, here and subsequently, 
o2 














2—__., 74 
¥ 2y fh? (74) 
From (11), the asymptotic variance of the estimator is 
2(1—p?) HitKgt... + Ha 
Vir ae y all 1 2 n—1 f 15 
vr) prX (My p"? + fap”? + «+ Hn)? iis 
This expression is minimized with respect to the ~,; by taking 
n—2 
a ad 
= bal siilaeess Me, | (76) 
so that the minimal variance attainable by a linear estimator is 
n—-2 \2 
a x p* 
w=p) ps2( 4 
Vain." Pp) = oa Be a <1 i (77) 


Moreover, as will be shown in §11, the variance of the optimal linear estimator tends to 
equality with that of the maximum likelihood estimator, and the estimator is therefore 
fully efficient. 
For n = 4, the estimator may be written in the form of equation (18), and the variance is 
Qy2 (A2—-A+1)(1+p) 


Vrp) = pax(i—p) (Ate)? (78) 





The variance is minimized by 





_ 249 
; 3y7(1 +p) 
which leads to Vain (Tp) = Bp?X(1— pi)" (80) 


Evidently A = 1-25 is again a good approximation at moderate values of p, corresponding 
exactly to the optimal at p = 0-5 instead of at p = 0-357; for large p, A = 1 is superior and 
for small p, A = 2 is superior. The variances for the estimators (21), (23), (25) are 


2 42(1 
V(rp,125) = aap) eae , (81) 








3 2 
Verea) = pax py Tp 
Vita) = axe — ee (83) 


p*(1—p)(2+p)" 
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The expectation of the general linear estimator, corresponding to equation (27) for the 
constant variance model, may be written (to the first-order terms in the bias) 


pe (+p) (A= 1)? + Ap} 
Mr) = P+ Sax (1p) (A+p) ; 64) 


For the three values of A of particular interest 


y* (1 +p) (1+ 20p) 





E(rp, 1.25) = p+ “AX(1—p) a <<" (85) 
ee, A 
B(rp.s) = P+ ax pj 4p" (86) 
li 1+p)(1+2, 
E(rp,2) = p+ “et oe p) (87) 


pX(l—p) (2 +p)? 


11. TAyYLOR’sS ESTIMATOR 


The form of equation (31), taken in conjunction with the properties of z; stated as equa- 
tions (71), (72), (73), indicates that Taylor’s autoregressive calculations correspond to 
maximum likelihood estimation for p, and indeed for p and a simultaneously. In fact, all 
the information on p contained in the observations is comprised in the statement that the 
2, (¢ = 2,3,...,n) are normally distributed with equal expectations and equal variances 
and are uncorrelated. The magnitude of their expectation involves another parameter, «, 
and minimization of the sum of squares of the z; (i > 1) from their mean is therefore the same 
as maximum likelihood estimation of p, «. This calculation is immediately seen to be the 
same as that for Taylor’s calculation of an unweighted linear regression of y;,, on y,;, and 


therefore 
rp =F (88) 


for every value of n. The maximum likelihood estimate of £ is then obtainable by equating 
z, to its expectation, and o?/2y can be estimated from the variances of the z;. 
At first sight, this appears to give the variance of r7 as 





o (1 —p?) 
Day W—1 >) 2° 
“Y . Yi 


the ordinary formula for the variance of a linear regression coefficient when the individual 
residuals have the variance (1 — p*)/(2y) shown in (73). However, this needs to be modified 
here, since the y; are themselves the observations, and in order to obtain the asymptotic 
variance of r the y; must be replaced by their expectations. The result is obviously identical 
with (77), whence it follows that the optimal linear estimator is fully efficient. 

For n = 4, the variance of ry is thus also given by (80), a result that may be verified by 
the alternative laborious algebraic process used in § 12. The expectation of the estimator is 


od vy? (1—p*) (1+ 4p +p?) 
Mra) =P pax p) a+p +e ii 








the 


(84) 
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12. HARTLEY ESTIMATORS 


Here also the estimator obtained from consideration of the regression of (y;,,—y;) on 
(Yis1 +Y;) Merits examination. For n = 4, the formula for this estimator, r,, is equation 
(40). Under the conditions of the increasing variance model, the numerator and de- 
nominator of 7, can be expressed in terms of the z; and their expectations obtained by the 
use of (72), (73) as 


B(A) = 2f%p®*+1(1 — p®) (1 —p*) + (07/27) (1 —p®) [4—p + 2p* + 2p(1 —p*) (1 a (90) 

E(B) = 2f%p?*(1 — p?) (1 —p*) + (07/2) (1 — p?) [2 + p + 2p? + 2(1 — p*) (1 —p?*)]. 

Again, of course, numerator and denominator are of the form specified in § 2. 

Now (A—pB) can be written as a quadratic function of the z;, the coefficients being 
polynomials in p. The variances and covariances of all the z? and z;z; can be formed from 
(72), (73) and simple properties of the normal distribution. Hence, the asymptotic variance 
of r,, formula (12), can be evaluated. To the first order 


_ 3y?(1+p) 
V(r,) = 2p2* (1 — p3)’ 
which is identical with (80), and therefore r, is of full efficiency for n = 4; whether it shares 
with 7, the property of full efficiency for all n is not known. 
The same procedure can be used to give the asymptotic variance of r;,, defined by equation 
(44), the estimator that Hartley’s procedure gives for n = 4. The result is 


(91) 


9/72 2 
2y7(1+p) (7+ lip +7p*) 
5 a ee a oe ee 92 
Mra) = DX p) (8+ 4p + 3p") _ 





The biases in r, and 7,, under the increasing variance model have not been investigated, as 
there appeared to be no evidence that these estimators possess any special interest. 


13. COMPARISON OF ESTIMATORS 


All the variances relating to the constant variance model that were obtained in §§ 5—9 
are multiples of g? 

p(1—p)?” 
The efficiencies of the estimators can therefore easily be compared in terms of the multipliers 
of this quantity that occur in the estimator. Table 1 presents these multipliers for 
different p. 

Formulae have been obtained earlier for the biases in these estimators, to the order of 
the terms in ¢?. These biases contain the same factor as do the variances, and the remaining 
factor is tabulated in Table 2. 

Judged by asymptotic variance, Patterson’s estimator with A = 1-25 compares well with 
the maximum likelihood estimator, except that the loss of efficiency approaches 12 % when 
pis small. If it were practicable to use A = 2 instead of A = 1-25 whenever p was less than 
0-15, the maximum loss of efficiency could be kept down to about 3 %. The Taylor estimator, 
rp, however, succeeds even better, and without any necessity for some prior knowledge 
of p to guide the choice between alternative formulae; the loss of efficiency never exceeds 
1%, whatever the value of p, and becomes negligibly small at either extreme of the range 
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of p. The bias is of the same order of magnitude as for Patterson’s estimators, though of 


opposite sign. 


Table 1 may suggest that r, is as good as rz, in that it has the same asymptotic variance. 
Doubtless the two are members of a class all having the same variance function. However, 
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Table 1. Multipliers of ¢7/{p?(1—p)?} in the asymptotic variances 
of various estimators for the constant variance model 
































Estimator 
P A 
r Tp, 1-25 Tp,1 Tp,2 Tp and 1), Tr T? 
(eq. (54)) 

0-0 1-5000 1-6800 2-0000 1-5000 1-5000 1-5556 1-5000 
0-1 1-3945 1-4616 1-6694 1-4195 1-3977 1-4165 1-4059 
0-2 1-2921 1-3103 1-4444 1-3719 1-2997 1-2999 1-3146 
0:3 1-2048 1-2066 1-2899 1-3497 1-2143 1-2072 1-2263 
0-4 1-1360 1-1368 1-1837 1-3472 1-1450 1-1367 1-1501 
0-5 1-0849 1-0918 1-111] 1-3600 1-0918 1-0851 1-0918 
0-6 1-0487 1-0650 1-0625 1-3846 1-0533 1-0488 1-0514 
0-7 1-0246 1-0516 1-0311 1-4184 1-0271 1-0246 1-0253 
0-8 1-0098 1-0482 1-0123 1-4592 1-0109 1-0098 1-0099 
0-9 1-0022 1-0522 1-0028 1-5054 1-0025 1-0022 1-0022 
1-0 1-0000 1-0617 1-0000 1-5556 1-0000 1-0000 1-0000 











Table 2. Multipliers of ¢?/{p?(1 —p)?} in the term of order ¢? for the bias in various estimators 
under the constant variance model (a positive bias indicates that the expectation of the 


estimator exceeds p) 

















Estimator 
Pp — 
Tp 1-25 Tp, 1 Tp, 2 | Th ‘A 
ES ——— - — — | 
0-0 0-0400 0-0000 0-2500 — 0-5000 1-0000 0-6667 
0-1 0-1783 0-1653 0-3628 —0°5722 0-8025 0-5509 
0-2 0-2794 0-2778 0-4545 — 0-5983 0-6842 0-5140 
0-3 0-3538 0-3550 0-5293 — 0-5926 0-6150 0-4996 
0-4 0-4086 0-4082 0-5903 — 0-5671 0-5748 0-4984 
0-5 0-4490 0-4444 0-6400 — 0:5306 0-5510 0-5016 
0-6 0-4785 0-4688 | 0-6805 — 0-4894 0-5360 0-5050 
0-7 0-4997 0-4844 0-7133 — 0-4472 0-5253 0-5068 
0-8 0-5146 0-4938 0-7398 — 04065 0-5164 0-5066 
0-9 0-5246 0-4986 0-7610 — 0-3683 0-5082 0-5042 
0-5000 
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rp has the merit of being directly computable from the data by a standard regression pro- 
cedure, whereas 1, requires the preliminary formation of (y;,,—y,;) and (y;,,+4;) and also 
the final calculation of r, from an estimator of (1—,p)/(1+/). Moreover, the bias in 7, is 
substantially greater. The Hartley estimator, r,,, has an asymptotic variance that is some- 
what larger than that of r,, for small values of p (p < 0-2), but for higher values approximates 
to V(#) even more closely than doesr,. At best, however, the gain is small and scarcely repays 
the more cumbersome calculations needed for the Hartley estimator. The estimator rg is 
also of high efficiency, but has no special merits, and its lack of any obvious generalization 
to higher values of n makes it, for the present at least, of little interest. 

For the increasing variance model, all the estimators discussed in §§ 10-12 have asymptotic 
variances that are multiples of y2 


pX(1—p)" 


Table 3 shows the multipliers, and so enables the efficiencies to be compared. 


Table 3. Multipliers of y?/{p?*(1—p)} in the asymptotic variance 
of various estimators for the increasing variance model 



































Estimator 
p 
f, > Th Tp, 1-25 Tp, Tp,2 4 

0-0 1-5000 1-6800 2-0000 1-5000 1-5556 
0-1 1-4865 1-5844 1-8182 1-4966 1-5278 
0-2 1-4516 1-4982 1-6667 1-4876 1-4806 
0-3 1-4029 1-4204 1-5385 1-4745 1-4223 
0-4 1-3462 1-3499 1-4286 1-4583 1-3584 
0-5 1-2857 1-2857 1-3333 1-4400 1-2930 
0-6 1-2245 1-2272 1-2500 1-4201 1-2285 
0-7 1-1644 1-1736 1-1765 1-3992 1-1663 
0-8 1-1066 1-1243 1-1111 1-3776 1-1073 
0-9 1-0517 1-0790 1-0526 1-3555 1:0518 
1-0 1-0000 1-0370 1-0000 1-3333 1-0000 





Formulae for biases have not been obtained for 7, and r,;; for the other estimators, the 
biases are tabulated in Table 4, again as multipliers of the factor that occurs in the variances. 

Perhaps the most surprising feature of Table 3 is that, despite the very different model 
used, the general pattern is very similar to that of Table 1. The Patterson estimator with 
A = 1-25 is good above p = 0:2; if it were practicable to use instead A = 2 when p< 0-2, 
the maximum loss of efficiency would again be about 3%. Now, however, both r7 and 1, 
are of full efficiency (though not identical with one another); 7, has a loss of efficiency that 
approaches 4 % for small p but becomes vanishingly small if p approaches 1-0. 

All that has been said so far in this section relates only to n = 4. Patterson’s linear 
estimators have the advantage that, for the constant variance model, they have been 
studied for values of n up to 12; although their efficiency relative to maximum likelihood 
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declines as n increases, it does not become intolerably bad, and there is no difficulty in 
forming an expression for the asymptotic variance. A similar study for higher values of n 
could be made fairly easily for the increasing variance model. 

On the other hand, the Taylor estimator has been shown to be not merely of high efficiency, 
but actually identical with the maximum likelihood estimator for the increasing variance 
model. Taken in conjunction with the close similarity between the two models in respect of 
the relative efficiencies of different estimators when n = 4, and in particular the high 
efficiency of r, for the constant variance model, it seems reasonable to conclude that r, 
will be an estimator of high efficiency for this model for all n. It may well continue to have 


Table 4. Multipliers of y?/{p?* (1 —p)} in the term of order y? for the bias in various estimators 
under the increasing variance model (a positive bias indicates that the expectation of the 
estimator exceeds p) 


























| Estimator 

| ? SSSA SSS $$$ _—___ —— 

b § | 

TP, 1-25 | rP,y TP,2 | 'T 

Sara * = ) Gees os eee ee 

| | 

0-0 0-0400 0-0000 0-2500 | — 0-5000 
0-1 0-1132 0-0909 0-2993 | — 0:5665 

0-2 01784 | 00-1667 0-3471 | —0-5744 

0-3 0-2367 | 0-2308 0-3932 | — 0-5393 
0-4 0:2893 0-2857 0-4375 —0:4763 
0:5 0-3367 0:3333 0:4800 | —0-3980 
0-6 0:3798 0:3750 0-5207 — 0-3132 
0:7 0:4191 | 0-4118 0°5597 —0-2281 
0:8 0:4551 0-4444 0-5969 — 01463 
0-9 0:4881 | 0°4737 | 0-6326 —0-0700 
1:0 0-5185 | 0-5000 0:6667 0-0000 








the property of being more efficient than the Patterson estimator over almost the whole 
range of p.* Mathematical demonstration of this would be tedious for n = 5 and excessively 
laborious for higher”, unless a more genera] method than that of this paper is found. Taylor’s 
method has a further advantage over Patterson’s, in that the calculation of the regression 
of y;,, on y; simultaneously provides an estimate of a(1—,p), and therefore of a, whereas 
Patterson’s calculation of r as a ratio of two linear functions must be followed by a further 
calculation of the linear regression of y on r* in order to obtain estimates of « and f/. If 
equation (1) represents a growth curve, « may be of substantially greater practical interest 
than f, since it represents the final size attained whereas / depends largely on the time at 
which observations begin. Equation (39) shows that the regression calculations for r,, lead 
equally simply to simultaneous estimation of «, but, as already noted, r, does not appear 
to have any advantages over r,. Hartley’s method, indeed, gives simultaneous estimates 
of the three parameters of equation (1), but the calculations are more laborious than for 


* [See, however, the following paper by Mr Patterson.—Editor.] 











—————— 


ee 2c aio 


~ 


=—_ = OUSlhlUcrlhlUC SO]. 


rin 
of n 


cy, 
nce 
t of 
igh 
tp 
Ave 


lors 
the 











D. J. FINNEY 387 


the other estimators and there seems to be no reason to suppose that r,, will acquire for 
larger values of n any of the superiority relative to r, and 7, that it evidently lacks for n = 4. 

The Taylor method, however, may be open to objection in respect of bias. It is well known 
that, if the independent variate in ordinary linear regression calculations is subject to errors 
of measurement, the regression coefficient will tend to be underestimated. Dr F. Yates 
has pointed out to me that one might expect 7, to be negatively biased on this account; 
the fact that, in Tables 2 and 4, r7 is the only estimator to display a negative bias is in 
conformity with this. The proportionality of the bias to ¢? or y? indicates that its relative 
importance would decrease if the replication at each value of 2 were increased and o? 
therefore decreased; however, if resources for additional replication were utilized to permit 
a narrower spacing of the x values at which y is measured, instead of additional y measure- 
ments at each x, the bias need not diminish, and, indeed, Yates has suggested reasons why 
it might even become relatively more important. If observations were made at intervals 
of 0-5 in w instead of at unit intervals, for example, the Taylor calculations would lead to 
estimation of p? instead of p, and thus a bias of the same magnitude as before would corre- 
spond to a proportionately greater bias in p. (The same argument leads to expectation of 
a positive bias in r,.) When ¢? or yf? is very small, the bias is almost certainly too small to 
matter; for example, under carefully controlled conditions, Taylor (personal communication) 
has found successive observations on a particular measurement in the same animal to have 
¢* of the order of 0-001 or even 0-0001 under the constant variance model. When the ob- 
servations are such that successive values of y must relate to different animals or different 
field plots, as in fertilizer response curves, ¢? might be very much greater and the bias 
correspondingly more serious. 

The conclusions to be drawn are that, for the estimation of p, both Patterson’s linear 
estimation process and Taylor’s are of high asymptotic efficiency. For the constant variance 
model, if it is essential to have a method for which the asymptotic variance is known, then 
Patterson’s should be chosen pending further research on Taylor’s; if attainment of high 
precision is more important than knowing that precision, Taylor’s method may sometimes 
be preferable. For the increasing variance model, there seems to be no doubt about the 
superiority of Taylor’s method in respect of the variance of p, as it gives the maximum 
likelihood estimator and the variance can be computed by substitution of r7 for p in 
equation (77). On either model, Taylor’s method will not be seriously biased when ¢? or 
y* is very small, but it should be adopted with great caution if ¢? or /* is large because of 
the possible danger of a large negative bias. 

Some months after this paper was completed, Mr H. D. Patterson allowed me to see the 
typescript of a paper (Patterson, 1958) in which he has developed a more general attack 
on the same group of problems. Although his analysis confirms that, for the constant 
variance model, the estimator r, is of high efficiency for n = 5, 6 and 7, he disproves the 
speculation above that it might still be more efficient than the linear estimators he had 
previously proposed. The advantage remains with r, at extreme values of p, but for 
0-3<p<0-7 the linear estimator may have an appreciably smaller variance. Still more 
disturbing is the size of the bias that attaches to r, for larger n; for n = 7, the bias may be 
ten times as great relative to the variance as it is for n = 4. This fact places a severe limita- 
tion on any circumstances in which r,, might advisably be used. For the increasing variance 
model, the position has yet to be fully explored, and despite the full efficiency of r7 the 
biases may so increase for larger n as to make the method seldom trustworthy. Moreover, 
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as Patterson rightly emphasizes, in practice observational data will usually be subject to 
random errors as well as to any arising from an autoregressive scheme, so that the truth may 
lie somewhere between the two mathematical models. 


I am indebted to Dr St C. 8. Taylor, of the Animal Breeding Research Organization, for 
arousing my interest in this problem and for stimulating discussions upon it, to Dr F. Yates 
for permission to use his comments on bias, and to Mr H. D. Patterson for showing me the 
typescript of his new paper. 
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THE USE OF AUTOREGRESSION IN FITTING AN 
EXPONENTIAL CURVE 


By H. D. PATTERSON 
Rothamsted Experimental Station, Harpenden, Herts 


INTRODUCTION 
The exponential regression curve 


y=a—fp*, where 0<p<l (1) 


is being increasingly used in statistical work in many branches of biology. Various numerical 
methods of fitting this curve have been suggested jin recent years for cases in which the y’s 
are independent and have equal variances and the x’s take the values 0, 1, 2,...,2—1. 

Stevens (1951) described a fully efficient least squares method for estimating a, £ and p, 
and the errors of these estimates. The method is one of successive approximation and 
requires a reasonably accurate initial estimate of p. It works well on an electronic computer. 
He also provided tables for n = 5, 6 and 7, since extended by 8. Lipton, which are very 
useful when the computations are carried out on a desk machine. Pimentel Gomes (1953) 
provided further tables for n = 4 and n = 5 which permit the least squares estimates of 
p to be obtained more rapidly than by the iterative procedure. 

In a previous paper I pointed out that the least squares estimates of p, 7 say, are of the 
form 


—, (2) 


where the w,(7) are complicated functions of * (Patterson, 1956). I there suggested that 
replacing r by py in the w,(7) would give estimates, 7(py)), say, where 
n—1 
X Wz(Po) Yx 
r(po) = —1————_ (3) 
= Wz(Po) Yara 


which are almost fully efficient over a range of values of p around py. In fact it was found 
that for each of the cases n = 4, 5, 6 and 7 asuitable choice of p, leads to very simple estimates 
of reasonably high efficiency over the useful range of p. 

The main uses of this method are: 

(a) to provide initial estimates of p for the method of Stevens (1951); 

(b) to enable rapid checks to be made on assumed values of p. 

Neither of these applications requires the estimation of a and #. The method is also useful, 
in the cases mentioned above, for a complete curve fitting when computing facilities are 
limited and full efficiency is not required. Satisfactory estimates of « and / can be obtained 
by simple regression of y on r*, provided that r is a reasonably efficient and unbiased 
estimate of p. 
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Since, however, the range of p efficiently covered by r(py) decreases with increasing n 1.€. 
other simple methods require to be considered. In one very simple and at first sight attractive of 
method p is estimated by the regression of y,,, on y, in the equation to 

t 
Yor = (1 —P) + PYe- (4) 7 


This estimate is one of a family of estimates obtained by a procedure which Hartley (1948) 
has termed internal regression. Hartley himself considered a regression equation involving 
x and certain partial sums of the y,. This regression equation is rather more difficult to 
handle than (4). 

Recently Finney (1958) has made a detailed comparison of various estimates of p for the wi 
case n = 4. These estimates were: 

(a) the simple regression estimate given by equation (4) (suggested to him by St C. S$. 


Taylor); me 
(b) a similar estimate obtained from the regression equation - 
l-p 1- , 
Yrt1— Ya = ao Tay Wert Ya)s (5) 
(c) the estimate proposed by Hartley (1948); 
(d) the estimates proposed by Patterson (1956); Ww 


(e) an estimate given by the ratio of two quadratic functions of the y,, but which is not 
expressible as a regression. 

Finney found that the estimate (a) gave a good performance for n = 4. He suggested | 
that it might well continue to have high efficiency for all n, but quoted comments by . 
F. Yates to the effect that bias in the estimate can be expected to become increasingly 
important as n increases. 





~~ 


Finney (1958) considered two models for the errors of the y,, the constant variance model T 

and a model in which the quantities k 
Zr+1 = Yai —PYx (6) 

are subject to errors which are independent and have equal variance. For this latter model, : 

estimate (a) is the maximum likelihood estimate of p. ; 


In the course of the work already referred to I made a study of the behaviour of estimate 
(a) under the constant variance model for values of n> 4. This study leads to somewhat 


different conclusions than can be drawn from the single case n = 4. A report on the work , 
appears therefore to be called for; this is the main purpose of the present paper. Although , 
only estimate (a) will be considered in detail the theoretical results developed below can be \ 
applied to a wide range of estimates including all those mentioned above. 
‘ 
QUADRATIC ESTIMATES OF p 
The estimate of p given by the simple regression of y,.,, on y,, in equation (4) can be expressed 


as n—1 ( 


(7) 


is not 
rested 
ts by 


singly 


nodel 


ressed 


(7) 
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i.e. in the same form as equation (3) but with w/, a function of y,, replacing w,, a function 
of py. Estimates of this type, given by ratios of quadratic functions of the y,, can be referred 
to as ‘quadratic’ estimates in order to distinguish them from ‘linear’ estimates of the 
type (3). 

It is convenient to express A and B in matrix notation as follows: 


H. D. PartERSON 


A = YiWy = YiDoyo, (9) 
B= Yow = YoDoyo: (10) 
where Yi = (YrYo-+Yn—-1)> Yo = (YoY¥1-+- Yn—2)> 


, f , , , 
Wo = (Wi Wg --- Wn-1) 


and D, is a matrix of order (n — 1) x (n — 1) with each diagonal element equal to (n — 2)/(n — 1) 
and each non-diagonal element equal to — 1/(n—1). 
More generally we can consider the regression equation 
ak(l—p), p 


k+lp + Ftp MY t Yess), (11) 


You = 


where k +0. The regression coefficient is 


Yi Do(kyo + ly) 


(ky, +ly3) Do(kyo+ly,)’ 


so that the estimate of p is now 
r(0, k,l) = YiDolkYo+l¥1) 
YoDo(kyo +ly1) 
The significance of the 0 in the new notation for the estimate of p will be explained later; 
k and / are self-explanatory. Thus the r defined by equation (7) is now denoted by r(0, 1, 0). 

It can also be shown that 7(0,k,1) is the estimate of p obtained from the regression of 
k'y,+VYy_4, on ky, +ly,,,, where k’ and l’ take any values such that kl’ +k’l. Thus, for 
example, the estimate obtained by Finney (1958) from the regression equation (5) is 
identical with the estimate obtained from the regression of either y, or y,,1 ON Y¥z+Yr41- 

A general form for quadratic estimates of p is suggested by the above. This is obtained 
by replacing D, in equation (12) by D, any non-zero matrix of order n—1xn—1 with 
elements in columns summing to zero. The restriction on the column totals of D ensures 
that terms in @ and £ are eliminated from the estimate of p. 

If D is symmetrical it is usually most convenient to calculate the estimate from a regres- 
sion equation obtained by a transformation of (11). 

An important class of estimates, using symmetrical matrices of a special type, and having 
the property of asymptotic efficiency for some value of p will be considered in a later section. 
These estimates include r(0, k,/) considered above and the estimate suggested by Hartley 
(1948). 

An example of an estimate of p using an asymmetrical matrix D is provided by the 
estimate rg derived by Finney (1958) for n = 4. This can be obtained by putting k = 1, 
1=—1 and 


(12) 

























Use of autoregression in fitting an exponential curve 


EXPECTATION AND VARIANCE OF THE GENERAL QUADRATIC ESTIMATE 


The method adopted to determine the expectations and variances of quadratic estimates 
of p is in principle exactly the same as that used by Finney (1958), but the algebra has been 
developed so as to give more general results and to permit arithmetical operations to be 
carried out in a systematic manner. The results given in this section apply to any quadratic 
estimate of the type considered in the previous section; the special features of the estimates Als 
r(0, k,l) will be dealt with later. The errors in the y, are supposed to be independently and 
normally distributed with variance o°. 

The following formulae for the asymptotic variance and expectation of r = A/B will 
be required: 


— 


_ var A +p* var B—2pcov (A, B) 











varr ~ {&*(B)? ’ (13) Th 
ae V,—pV, , pvar B—cov (A, B) 
0) = e+ gay tray” " 
where V, and V; are the terms in o” in &(A) and &(B), respectively, and &*(B) = &(B)—Vp. ti 
These expressions are suitable when o? is small relative to B?. They have been discussed in oa 


detail by Finney (1958) and no further comment is required here. 
The variances and covariances of A and B can be obtained by repeated use of the formula de 


cov (s’Dt, u’Ev) = &(s’) DC,, E’&(u) 
+ &(s’') DC, E&(v) + &(t’) D’C,,, E’&(u) 
+ &(t’) D’C,, Eé(v). (15) 





~ 
— 


Here s, t, u, v are jointly normally distributed variates with covariance matrices 


COV (t,,0,;) COV(t,,v) ... Ww 
C,, = (om (ty, 01) COV (ty, ¥2) ) (16) 
cov (t3,0,) Cov (ts, V2) 


etc. In addition the expression } 


&(s'Dt) = &(s’) D&(t) + trace (DC}) (17) } 
is also required. T 
It is convenient to define two matrices t 
R’ = (l pp"... p*-*), (18) P 
and 0 0 0 0 0 
1 0 0 0 0 
U=|0 1 0 0 0}, (19) 
oO C8 . I GD 
) 
the auxiliary identity matrix, and the following scalar quantities: 
| 
F, = R’DR, F, = R'D’DR, \ 
F, = R'D’'UDR, F,= R’DD’R, t 
F,= R’DUD’R, F,= R’DDR, 


F,= R’DUDR, F, = R’DU’DR. 


(15) 


(16) 


(17) 


(18) 


(19) 


(20) 
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The variances and covariances of A and B are then 
var A = f?o* ~ +Ip)? F, + p(k? +0?) F, + 2klp*F, + 2pl(k + lp) F,+ 2kp(k+lp) Fy}, (21) 
var B = P?o{(n + lp)? F, + (k? +0) Fy + 2kLF, + 2k(k + Ip) F, + 2U(k + Ip) Fe}, (22) 
cov A, B = P?o%{(k+ lp)? F,+ p(k? +l?) Fy + 2klpF, + (k+lp)(pk+l) F, 
+ (k+lIp) (kF,+lpF,)}. (23) 
Also &*(B) = (k+lp) KH. (24) 
Substitution of (21) to (24) in (13) leads to the following simple expression 


on, oO {(L+p?) KF, — 2p, 
pines F? i (25) 
Thus the asymptotic variance of r is not affected by the values chosen for k and l. 
It should be noted that the asymptotic variance of the linear estimate 
yi:DR 
y,DR 
is also given by equation (25). This result will be used in the next section to derive quadratic 
estimates with minimum variance for a particular value of p. 
The bias in r, given by the second and third terms of (14), is rather more complicated and 
depends on k and |. The two bias terms are 





(26) 


Fe eee oy — pimsee P| (27) 
B (k +Ip) Fy 
o Et eM Oh Ey OE ates ee 
d ll aa aa sae, SS _ ions. LA tos, I 28 
” al (E+ pl) F3 = 


When D is symmetrical, F, = F, and F, = F, = F, so that (28) simplifies to 
a (“e + 2kp —1l) F,- aa 


#\ (kt Ip), (29) 





(k+ Ip) F3 


QUADRATIC ESTIMATES WITH MINIMUM VARIANCE WHEN pf = fy 


The construction of a matrix D such that the quadratic estimate of p has minimum asymp- 
totic variance when p takes some particular value, p, say, will now be considered. 

The required matrix can be obtained by minimizing the variance of the linear estimate 
(26) since, as previously noted, the asymptotic variance of this estimate is the same as that 
of a quadratic estimate using the same matrix D. 

The asymptotic variance of (26) when p = py, can be written 


o2 (1 + p§) = Wz— 2Po ~ Wy Wz41 


p —_ 2 ? (30) 
(fer) 
where the w, are proportional to the elements of DR. Two restrictions need to be placed on 


n—1 
the w,. First, 5 w,, must equal zero; secondly, the absolute magnitude of the w, must be 
1 


n—1 
fixed. It is convenient to write )) w,pg-! = A, for the second restriction. 
1 
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The minimization of (30), subject to the two restrictions, leads to the n — 1 equations 
(1+ /§) Wz — Po(Wrr1 + Wz-1) = As po * +z; (31) 


where x = 1, 2,...,n—1, wy = 0, w, = Oand Ag, A, are constant for all x. In matrix notation 


Vw = AgRy+Azl, (32) 
where 1+p§ —Pp 0 0 
—Po 1+P5 -P 29 
V= 0 —Po 1+pP§ -Po ++ |]; (33) 
0 0 —-Po 1495 
Ro=(1 Po Po ++ po) 


w is a column vector of the w, and 1 is a column vector with all elements equai to 1. Hence 


w =A,V-'R, +A; V—1. 


n—1 
Since 1’w = 0, i.e. } w, = 0 A.1’V-R 
1 -aA, = 755 
1’V"1 
and since A, can be chosen to be 1 
V11'V 
w= (v- — “var } R, (34) 


Thus the required matrix D which minimizes the variance of the quadratic estimate when 
P = Po is given by 





Vv11'v4 
ot. 
D=V vat: (35) 
The elements d;; of D are obtained from the elements c,; of V—! as follows 
Cig Cig 
ie 4 36 
d;; Ci ¥ Cy ( ) 
4] 
The c,; = c;; are given by 
pi - P8108") 
Cy = et (SJ), (37) 
"er a= ey 
when p, <1, and Cy = Hod) (<)), (38) 


when py = 1. 

The quadratic estimates with D as determined above can conveniently be denoted by 
1(Po, k,l). The estimate (12) is obtained by putting p, = 0 in V. The r(0, k,l) are therefore 
fully efficient when p = 0; in the case n = 4 they also tend to be fully efficient when p 
approaches 1 but this is not true for n>4. The estimate considered by Hartley (1948) is 
asymptotically efficient when p = 1. It is, in fact, r(1, 1, 1) in the present notation. 








ons 
(31) 
tation 


(32) 


Hence 


(34) 
when 


(35) 


(36) 
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The estimates r(9, k,l) can be regarded as the quadratic counterparts of the estimates 
considered by Patterson (1956). They can be expected to cover a wider range of p around 
py with high efficiency since D(ky, + /y,) makes some allowance for the difference between 
pand py. On the other hand, the r(po, k,1) are generally more difficult to calculate and may 
be subject to considerably greater bias. The linear estimates are subject to a bias term 
of the order of magnitude of (29), but the quadratic estimates are subject to an additional 
term (27) which can be very large. 

One method of calculation when p, < 1 is to estimate p (and, if required, « and f) in the 
following regression equation on pj and Y. 


a a z , (P—Po) 





where x = 0,1,...,n—1 a(1— Pp) (k+lpo) 
~ (1=po) (e+ Ip) ’ 


) —PoYz-1 a ky,-1 + ly,, 





and Y, is taken to be zero. When p, = 1, the following regression equation on x and Y, 
can be used: 





a a(1—p) (k+l), (p—1) 
Y, = (a—f)+ (k+lp) 2+ Epp (40) 


It will be noted that when p, = 0 and Y, is taken to be 0 equation (39) degenerates to equa- 
tion (11) together with the initial equation 


Yo = a—f. 


EXPECTATION AND VARIANCE OF 1(0, k, L) 


The family of estimates r(0, k,l) has been considered in some detail for n = 4, 5, 6 and 7. 

The matrix D = D, defined for equations (9) and (10) is particularly simple. Its trace, 

required in (27), is n—2 and trace DU is —(n—2)/(n—1). In addition F, = F,. 
Expressions for F, and F, are as follows: 


Fy: n=4: 2(1-p)?(1+p+p?), 


)* ( 
m=5: }(1—p)?(3+4p + 6p? + 4p? + 3p%), 
n= 6: §(1—p)?(2+3p + 5p? + 5p? + 5p* + 3p? + 2p), 
m=7: 1(1—p)?(5+8p+ 14p?+ 16p? + 19p4 + 16p° + 14% + 8p? + 5p). 
Fy: n=4: 3(1—p)?(—1+2p—p?), 
2 


m= 5: g(1—p)?(—1+8p + 6p? + 8p? —p'), 
n=6: z£(1—p)?(—1+ 16p + 20p? + 30p? + 20p4 + 16p° — p*), 
n=T7: 5(1—p)?(—1+ 26p + 38p? + 64p? + 61p* + 64p5 + 38p* + 267 — p'). 


Although not essential to the determination of efficiency and bias these expressions are 
useful for studying the limiting case as p> 1 when (25) and (29) become indeterminate. 
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EFFICIENCIES OF THE ESTIMATES 7(0, k, 1) 


The efficiencies of r(0, k,l) when n = 4, 5, 6 and 7 are set out in Table 1. These are given by 
the ratios of the asymptotic variances of the least squares estimates obtained by the method 
of Stevens (1951) to the variances given by (25). For comparison the efficiencies of the 
following estimates 

p= Yat Yo— 59h 


= for n= 4, (41 
442+ Y1—5Yo ; 
4y4 + 3Y3 — Yo — 6Y, 

ax i 93 3 for n=5, (42 
4y5 + 3Y2—Y1 — 6Yo ' 
4y, + 4y,+ 2y3— 3y.—Ty, ; 

(a a. in ae for n= 6, (43) 
4y,+ 4Y3 + 2y2—3y,— TY 
p = Yet YstYs—Yo— 2% he en’. (44) 


~ Yst+YatYs—Y1— 2Yo 
are also shown in Table 1. These are the estimates proposed by Patterson (1956). 


Table 1. Efficiencies of simple estimates of p 
































n=4 n=5 | n=6 w= 7 
p | 
r(0, k,l) | Eq. (41) | r(0, k, l) | Kq. (42) | r(€ %,1) | Eq. (43) | r(0,%,1) | Eq. (44) 

0-0 100-0 89-3 100-0 77-4 100-0 65-2 100-0 60-0 

“1 99-8 95-4 99-4 88-5 99-2 79-2 99-1 T4A-¢ 

2 99-4 98-6 98-0 95-7 97-2 90-2 96-7 86-8 

3 99-2 99-8 96-6 99-2 94-6 97-0 93-3 95-0 

“4 99-2 99-9 95-5 99-9 | 92-2 99-6 89-7 98-1 
0-5 99-4 99-4 95-0 98-9 90-4 98-8 86-6 96-8 

6 99-6 98-5 94-8 97-0 89-4 96-1 84-5 92-4 

“7 99-8 97-4 94-9 94-7 89-1 92-5 83-5 86-9 

8 99-9 96-3 95-1 92-3 | 89-1 88-8 83-2 81-3 

9 100-0 95-3 95-2 90-0 89-2 85-2 83-3 76-3 
1-0 100-0 94-2 95-2 87:8 | 89-3 81-9 83-3 72-0 




















The results for n = 4 are in accord with those given by Finney (1958). r(0, k,l) has very 
high efficiency over the whole range of p, and gives a rather better overall performance than 
the estimate proposed by Patterson (1956). This superiority is not, however, maintained 
for n> 4; when n = 7 r(0,k, 1) is substantially less efficient than the estimate (44) over the 
range of the most useful values of p. 


Bras In 7(0, 1, 0) 


The total bias given by (27) and (29) can be expressed as a multiple of the asymptotic 
variance of r as follows: 


bias = Ovarr. 


Values of 0 for the estimates r(0, 1, 0) and the estimates (41) to (44) are set out for n = 4to7 
in Table 2. 











ae Ce 
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Table 2. Bias in r(0, 1,0) compared with Patterson’s estimates 





Values of 0=bias/var r 



































p n=4 n=5 n=6 n=7 
| 

r(0, 1,0) | Eq. (41)| 7(0, 1,0) | Eq. (42) | r(0,1,0) | Eq. (43) | r(0,1,0) | Eq. (44) 
00 | —0-333 | 0-024 | -o-583 | —0-242 | -0-700 | -0-415 | —0-767 | -0-500 | 
“1 | — -409 122 | — -821 | — -148 | -1-068 | -— -340 | —1-250 | — -439 | 
| “2 | — -460 213 | —1-042 | — -038 | -1-439 | — -246 | -1-754 | — -357 | 
| -3 | — -488 293 | —1-241 062 | —1-814 | — -137 | —2-291 | — -253 | 
| -4 | — -495 ‘359 | —1-408 ‘164 | —2-185 | — -018 | —2:866 | — -132 | 

0-5 | —0-486 | 0-411 | —1-535 0-256 | —2530 | 0-102 | —3461 | 0-000 

6 | — +465 449 | —1-613 335 | —2-818 | 215 | —4-035 | 132 
| -7 | — +435 ‘475 | —1-642 398 | —3-019 | 314 | —4-516 | 253 | 
| -8 | — -402 ‘491 | —1-626 -446 | —3-112 | 395 | —4-829 | “357 | 
| 9 | — -367 499 | —1-575 ‘479 | —3-099 | 456 | —4-932 | 439 | 
| 1:0 | — -333 500 | —1-500 ‘500 | —3-000 | 500 | —4-833 | 500 | 








For n> 4 the bias in the simple regression estimate is considerably greater than for the 
Patterson estimates. Thus, for example, if each y has a standard error of + 0-1/ the bias in 
r(0, 1,0) when p = 0-6 and n = 7 is approximately — 0-06. A bias of this magnitude must be 
regarded as serious particularly if average values of p over several sets of data are required. 

It should be noted that as 7 increases the bias in r(0, 1,0) becomes relatively more im- 
portant, as predicted by Yates (quoted by Finney (1958)). 


Estmartess (0, k, 1) wIrTH LOW BIAS 


From the above it will be seen that the value of r(0, 1,0) as an estimate of p is severely 
limited. There are, however, estimates in the same family (0, k,l) for which the biases are 
much smaller. 

As previously noted the magnitude of the bias depends on k and 1. In fact if 

k_ {(n—3)(n—1)+(n—2)p+(n—1)p}F, 
— ie Pes A soc it : (45) 
L~ {(n—2)+(n—1)(n—4)p} F, + 2(n—1) F 

the bias is zero. Values of k/1 giving zero bias are set out in Table 3. 

Thus for n = 4 the regression of y,,, on 2-75y,,+ y,,,, can be expected to produce reason- 
ably unbiased estimates of p (given by 2-75b/(1—b), where b is the regression coefficient) 
for moderate values of p. The actual biases in this and other cases, again expressed as 
multiples of the asymptotic variances, are set out in Table 4. 

For moderate values of p such as are likely to arise in practice the biases are in each case 
substantially less than the biases in the estimates r(0, 1,0). Choice of a suitable value of 
k/l does, however, become more difficult as n increases. Thus, when n = 7 the bias in the 
estimate proposed by Patterson (1956) (see Table 2) is smaller than the bias in r(0, 1-5, 1) 
or, indeed, in any single r(0, k,1) for most values of p. 
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that, whilst 7(0, 1, 0) is fully efficient when the errors follow a first-order Markoff process, as 
pointed out by Finney (1958), the Stevens estimate of p is subject to only small biases and 
moderate losses of efficiency. In practice there are likely to be random errors in the y, 
whether or not there are also autocorrelated error components. My own view is that the 
method of Stevens can be safely recommended for most biological applications. 


SUMMARY 


The behaviour of the simple regression of y,,, on y, as an estimate of p in the equation 


Ya = %— Bp, 
where 0<p<1, x= 0,1,2,...,~—1 and the y, are subject to independent errors with 
equal variance has been investigated. 

As pointed out by Finney (1958) the estimate is of high efficiency and is subject to a rela- 
tively small bias when n = 4. As n increases, however, the efficiency decreases markedly 
over the useful range of p and the estimate is subject to an increasingly large negative bias. 
These properties make the estimate unsuitable for general use. 

Alternative estimates given by the regression of y,,, on ky,+ly,,, have also been con- 
sidered. These estimates have the same asymptotic efficiency as the simple regression, but 
by a suitable choice of k/l the bias can be considerably reduced. They are, however, of only 
limited value since in practice choice of k// is difficult unless 1 is small. 


I am grateful to Dr D. J. Finney for letting me see his paper in draft form before 
publication. 
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Noie added in proof. Equations (39) and (40) are obviously not suitable if k+/p,=0, 
but alternative equations giving the same estimate of p but different estimates of « and 
f# can be used. Various forms of equation (40) with Y, not necessarily zero have been 
considered by R. F. White in an unpublished thesis (Iowa State College, 1956). 
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TWO QUEUES IN PARALLEL 


By FRANK A. HAIGHT 
Institute of Transportation and Traffic Engineering, University of California 


SUMMARY. The method of differential-difference equations is used to investigate the case in which 
each arrival to a system of two queues joins the shorter queue, or, if they are of equal length, one 
particular queue. In case each person must remain in the queue which he originally joins, relations 
are obtained between the asymptotic state probabilities. If queuers are permitted to change queues 
whenever it seems advantageous to do so, the formulation is simplified, and explicit expressions are 
obtained. 


1. [IyTRODUCTION 


Several writers have analysed a queueing system in which service is provided by several 
facilities operating jointly, in such a way that, to obtain service, each queuer need deal with 
only one of the service facilities. Simple examples of this system can be seen in the multiple 
queues of banks, government offices, etc. 

It has been the practice, however, to assume that arrivals to the system are assigned to the 
queues in rotation, and that they must remain in the queue to which they were assigned. 
Our experience clearly indicates that this assumption is unrealistic; people prefer to join the 
shortest queue, if there is one. Furthermore if, while they wait, some other queue becomes 
shorter, they will change queues. In this paper we limit the number of queues to two; and 
study separately the cases in which queuers do/do not change queues. 

Arrivals to the system will be assumed to be in the form of a homogeneous Poisson process 
with parameter A, and service will also be Poisson, with parameters 4 and yw’ for the two 
queues. The length at time ¢ of the queues will be denoted by X(t) and X’(¢). An arrival will 
join the queue characterized by X(t) if and only if, at his time of arrival 

X(t) < X'(#). 
If, on the other hand, an arrival finds 

X(t) > X(t) 
he will join the queue characterized by X’(t). Thus the queue X(t) has a certain advantage 
in absorbing arrivals in the equally advantageous case, and so will be called the ‘near’ 
queue. Similarly, we will call X’(t) the ‘far’ queue. 


= Prylt) = Pr{X(t) = «, X"(t) = y}, 
Pz. (t) = Pr{X(t) = a}, 
Dy (t) = Pr{X"(t) = y}. 


In the calculations which follow, we will be concerned mainly with statistical equilibrium 
(¢ = 00), and indicate this by suppression of t 


Pry = Prry(), Pr. = Px), Py= P.y(©). 
Moments of these distributions will be denoted by the following system: 
E[X(co)] = Lap, =m, E[X'(0)] = Lyp.y =m’, 
x y 
var [X(co)] =v =o", var[X’(oo)] =v’ =o", 
cov [X (00), X’(c0)] = rao". 
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For B(s, s’) and its partial derivatives we have the following values: 


s=e=0 se=1,82=0 s=0,8’=1 axe = 1 
B L’Por WD, L'(P — Poo) ph (1—p’) 
B, KP hpam E’'(P1.— Pro) fh’ (m—mop’) 
Be Poe WP.» L'(pmo— P+ Poo) h'(m’—1+p’) 
Bop 2h’Pay B'p.s(v, +m — mM) 21'(P2.— Poo) HL’ (v+m?—m—p'v)— p’'m+ p’m) 
Byy 2'Pos 2u'D.s L'(prvo + pme? — 3pm + 2p — 20) h'(v’ +m’? — 3m’ + 2 — 2p’) 
Bw EK Pi2 K'p_2Ms HM P,,—Pr.+ Pr) b'(oo'r +mm’ —m +m) 


(C) Departure from near queue 


This case is perfectly symmetrical with the preceding one, and the relevant formulae can 
be written down at once. The term required for (0) is 


UP +1, y(t). (8) 
The term required for (c) is 
O(s, 8’) = (u/s) O(8, 8’) — (4/8) ols’) p, (9) 
where Pgo(8’) = Poy s’”. (10) 
0 


The following table is obtained by exchanging s and s’, w and y’, x and y: 


s=e' =0 s=l,e’°=0 e=0,e’=1 s=e’=1 
CLP» LM(p’ — Poo) EPpy. M1 —p) 
C, P20 LM(p’my — p’ + Poo) LPs, H(m—1+>p) 
Cy UP /(P.1— Por) Lp, Mm Hm’ — mop) 
Cre 2UPso L(p'% +p’ms — 3p’mg + 2p’ — 2p) 2Ups, (vv +m? —3m + 2—2p) 
Cyy 2UPr.2 21(P. 2 — Por) Mp, (vi +mye—m) wv’ +m”? —m’ — pro— pmo? + pmo) 
Coe UP an Mm, P.1—P.1rt+Pon) UP2.m, M(oo’r+mm’ —m’ +m) 


(D) Entrance into the system 


So far the development has been fairly straightforward. The characteristic difficulty of the 
problem occurs in this section, for an arrival to the system when X(t) < X'(t) will produce 
a different transition from the one produced if X(t) > X(t). 

The probability of an arrival is 


(1 — wAt) (1 — w’At) AAt. (11) 


If the system is to be in state x, y at time ¢ + At, then there are three possibilities for time t: 

(i) Ifx<y, then, at time ¢, the system must have been in state x— 1, y, since the arrivai 
must have joined the near queue. 

(ii) If z>y+1, then, at time t, the system must have been in state x,y—1, since the 
arrival must have joined the far queue. 

(iii) In the intermediate cases x—1 = y and x = y, we cannot say whether the state at 
time t was x—1,y or x, y— 1, for either is possible. 

Therefore, the factor which must be multiplied by (11) is of the form 


I(x, Y) Po-t,y +J (zx, Y) Px yw 
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where the values of J and J are given by the following table: 
y=0 y=1 y=2 y¥=3 y=4 
a=0 — f= z=] ? a | F=1 
_ wet puted i ged bodes atten 
A ies | Pe | J=0 J=0 J=0 
2=2 F=0 f= 1 i=] f=% f=1 
J=1 Be | o=1 J=0 J=0 
2=3 I=0 I=0 Zs 1 = 75 
e can J=1 J=1 J=1 J=1 J=0 
(8) We now show the situation at time ¢ + At, giving in each cell the states from which that value 
could have arisen, with an asterisk to denote an impossible situation. For convenience in 
(9) computing the generating-function equation, the appropriate powers of s and s’ are given in 
place of the values of x and y. 
(10) 3’ sl 8” a’ a 
3° x * * a * 
sl 0, 0 1,0 0,2 0, 3 0, 4 
0,1 
3? * 2,0 2,1 1,3 1,4 
1,1 1,2 
88 * 3,0 3,1 3, 2 2,4 
2,2 2,3 
st * 4,0 4,1 4,2 4,3 
3,3 3,4 
mo) Fd - 5,0 5,1 5, 3 5,3 
4,4 
Defining (8, 8’) = > pz,s%s'Y 
xwy 
f the and adding the terms in bold type in the above table separately from those not in bold type, 
duce we find “ 7 a 
D(s, 8’) = A(s DY Poys’Y +8? ¥ Py, 8'% +8? ¥ poy 8’ + “ 
0 1 2 
} 0 1 2 
il +A(s! D Pry8’Y +878" Y Poy 8’ + 838 Y pgy8'Y + " , 
(11) } 0 0 0 
sa D(s, 8’) = Asy +As'(d—y). (13) 
‘ival The evaluation of D and its derivatives follows: 
s=s'=0 s=l1,8’=0 s=0,8’=1 s=7 = li 
the D 0 Avo 0 r 
D, APoo APoo A(p + Pro) A(N +m) 
e at Dy 0 Ap’ — Poo + Port Pir) 0 AF +m’) 
Ds 0 0 2A(P1.— P10 + P20 + Par) — 
Dey 0 2A(P.1— Por — Pir + Por + Piz t+ Poo) 0 ; — 
Dy A(P19 + Por) A( Por + 2Pir + P’M) A(Pio + PM) = 
(12) D,,(1, 1), Dy(1, 1) and D,,. (i, 1) will be evaluated later. 


26 
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Adding (3), (6), (9) and (13), we find the generating function equation of the system 


(18) 6+ (6-8) = 5" 1 —paile 1+ [8 —P'aol 


(14) 


This equation could be differentiated to obtain the equations which follow, but it is easier 
(and there is less chance of error due to division by zero) to combine the values given in the 


four tables. Taking the rows from left to right in order gives 
—Apoo+ Por + Pr = 9, 
—Ap'+H'P 1 +APoo = 9, 
—Ap+pup,, = 9, 
0 = 0, 
—AP0— P10 + F'P1r + HP 20 + APoo = 9, 
—Ap'mg + L'P 1M — P'L + HPoo + APoo = 9, 
—AP1.— LP. + Po. + AP t+ AP = 9, 
AN = w(1—p), 
—Apy—!'Pat! Pot Pu = 9, 
—AD 1-H PD AtHhP 2+ Ap' —Apoot AP tAPu = 9, 
—Apm— L'p + L'Doo+ HP 1,.m, = O, 
AF = p'(1—p’), 
—AP29— P20 + L' Par + HP ao = 9, 
—Ap'Wo + Lp w,— 2p’ pM, + 2p'w— WP = 9, 
(where w = v+m?—™m, etc.) 
—Apy,— MP2. + HPs. + AP, — Ayo + AP + Aoi = 9, 
D,,(1, 1) = Aw + 2m — 24+ 2p, 
—Apor—H' Port H' Pos + HPi2 = 9, 
—AP.2—L'P 2+ L'P.3 + AD_1—APor — AP + APos + AP12 + APaz = O, 
— Apwy — 2p’ ping + 2p'p — 210’ Doo + UP1,W, = 9, 
Dy Al, 1) = Aw’ + 2m'p' — 2p’ + 2y'p’, 
—(A+L+K) Pir t+H'Py2 + HP +A(Por + Pro) = 9; 
—(A+2') p.m, +H'D 9M — Up. + LPo+A(Por + 2P11 + P’Mo) = O, 
—(A+f) Pym, + Mpa.ms— LP, + H'Pyo+ A(Pyo + pm) = 9, 
D,(1, 1) = Ago’r+Amm' + p'm + pm’ — pm, — pm. 


3. INDIVIDUAL QUEUE LENGTH DISTRIBUTIONS 


(15 


o— 
_ 
a ¢ 


We will use capital letters to denote compound probabilities in the conditional distributions 


x 
Pyke = 2 Piw Qry = 1-P,,; 


vy 
Su."as = Pi Pes Qey =1 - Pi: 
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Using equations (17), (20), (28), and others which can be obtained by further differentia- 
tion with respect to s, we obtain 


Frank A. Haicut 


Pi. ae wpa } (38) 
Pr. = PQz-1,2-2P 2-1" 

For the far queue, we begin with (16), (23), (31), ..., and find 
Py P'Qy-1,y-1P. y-1° (39) 


Equations (38) and (39) generalize a formula (Haight, 1957, p. 362 (2)) which applies to the 
special case in which the far queue (there called the ‘balking distribution’) is independent 
of t. 

Some other interesting relationships can be found by adding up directly various membexs 
of the set of difference equations (2) + (5)+(8)+(12) = 0. For example, adding equations 
for which either x or y has a fixed value N, we find 

Apyn = FPN. Pyint!PwiPywis (N = 0,1,2,.--). (40) 
Summing over all JN, 
A >» Pry = >» Sat >> Pry: (41) 
z=y a>y a<y 
On the other hand, choosing equations for which «+y = N, and adding, we obtain 


. 2 yew = (¢+’) (Py,w-1+Py-t,y) +¥Pwit0t+'Powir (N =9,1,2,...), (42) 


Zz 


where p is set equal to zero when it has a negative subscript. 
Letting 7, denote the probability that the whole system contains members, we can 
write, from (42) 


1 1 
1, = p (7724.1 — Po, x41) + (77241 — Px+1,0)+ (43) 


4. VARIOUS EQUATIONS INVOLVING THE MOMENTS 


z 
P, = U Pin Q, =1-P,; 


y 
Py= Eps = 1-Py; 


denote the compound probabilities in the marginal distributions. From the set of equations 
beginning with (19), (35), ..., the following recurrence formula in the conditional means can 


be found: 
Zz 


EMo1P.ct = Amz Pt PZ — MPP oz ALL (44) 
where Zo = Pow 
Z, = Put2Pu, 
Z_ = Doo + 2P 12+ 3P22— Pir 
Zs = Pog + 2713 + 3Po3 + 4053 — Piz — 2P 22, 
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Similarly, beginning with equations (24), (36), ..., 


y 
PM 1 Py. = Amy Py. +H'P,—L'p'Pyy—A ~ Z} (45) 
where Zo = 9, 
Z1 = Pw 


Zs = Poo +2Po1, 
Z3 = Pao t 2P31 + 332 — Por, 
Zs = Paot Par + 3p 42 + 4243 — Psi — 2P32- 
It will be noted that 
3Z,=N, 32,=F. 
0 0 


Thus the limiting form of (44) is (21), and of (45) is (25). 
A relationship between m and m’ can be found from (29), (33) and (37) 


D,,(1,1)+D,(1,1) =A XL (@+y)?Pryt+ X U*Dzy, 
xray z>y 


D,,(1,1) = Aw+ 2A > xpz, (46) 
xzsy 
and, comparing with (29) 
E &Pzy = (1/p)(m—4). (47) 
zsy 
Similarly E Yay = (1p’)(m' —q’). (48) 
x>y 


Now 
Dyg(1, 1) + D1, 1)+ Dl, +A SE ype +a » YPry =A Y (y+ 2+ 2y+1) py 
and therefore ia ie 7, 
Deg(1, 1) = —A(N +m) — A(F +m’) — (Alp) (m—q) — (A/p’) (m' —q') 
+A ¥ (xy + 2x + 2y +1) py. 
Comparing with (37), we find * 
(A—p—p’) (m+m') + (A+ pmg + p'my) = 0. (49) 
It is also interesting to note that since 
Dyg(1, 1) = Larypzy +X max (2x, Y) Pry, 
mM—M, mM’ —mMs 


x max (x, = pot 50 
(2,Y) Pry r rm (50) 





5. CHANGING QUEUES PERMITTED 


We will now suppose that some individual (say the last) from a queue will go over to the 
other queue whenever it appears advantageous to do so, that is, whenever his place in the 
queue will be improved.This means that the queues can never (except for the instant required 
to change queues, which we will ignore) differ in length by more than one, so that the possible 
states of thesystem must be of the form x -- 1, x or z, x or x,x—1. Equivalently, we might say 
that each discharge from a queue is to be considered as taking place from the longer queue, 
unless they are equally long. 
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The difference equations in equilibrium are not difficult to find, keeping in mind this 
principle, and consist of the following: 


State Equation 
0,0 —Apoot #'Por + HPro = 9, (51) 
ae for ©>O0  —(A+M+P') Pert (H+’) (Pe,241 + Post,z) +A(Pz,2-1+ P2-1,2) = al 
1,0 —(A+/) Pot 'Pir+APoo = 9, (53) 
a+1,x for x>0 (AF M+) Poste tH Prit,2+1 + APz,2 = 9, (54) 
0,1 —(A+/') Pou t+/Pu = 9, (55) 
#,x+1 for x>0 —(A+P+2') Pe ori tlPoen = 9. (56) 
We will use the following abbreviations: 
=, A ') = ? Pe Be 
Po=T, (At+etp’) = ree B 
‘ ‘ (1+p’)(u+p')? 
_—_ Sos! Ae." a dt ee 
LijA+adX+Kf) = A’, L(K+0d+PAp) =A, WP _=B 
, > a(a+A) 
Solving (51), (53), and (55), we obtain 
; ap p(At+z’) 
= = = — . | 
Pou ata” Pio a+A” Pu a+Aa 7 (57) 
Solving (52), (54), and (56), we obtain 
a? Aa 
ae Bs she Fe Pree or sicestiaien oe 58 
Prz BPz-1,2-1 si (u +h’)? Pr-1,2-1 (u +h’)? (Px—1,2-2 + Pz-2, 2-1); ( ) 
B, ony Au i” 
paneer — 59 
Pr-1,2 a Pr-1,2-1 a (utp pPe-be- (ut+p'y? (Pr-1,2-2 + Px—2, 2-1): (5 ) 
bu ope Au’ 
Px,2-1 =~ @ pa-be-t + (utp pePe-bet ae (utp'? (Ps-15-4 + Doe «-1)- (60) 


Substituting (57) into these three equations, we can find pj», po; and py». Using the equations 
repeatedly, we can find each p,,, in terms of 7 


Pox = ALS**n (x>0), (61) 
Pr-1,x = Au Lp**n (x> 1), (62) 
Pr,x-1 a KLB**n (%> 1), (63) 


and these may be verified by substitution into (52), (54), and (56). Summing over all values 
(and assuming / <1 to insure convergence; this condition replaces the classical p <1), we 
find an explicit formula for 7 
_ R2 
o= et ae a (64) 
(~+A)(1+p—f?)+Ap(1 +p) 
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The individual queue-length distributions, and the various moments may now be computed 
easily. Some of these values are given in the following equations (dots used in preceding 
sections are omitted): 























ae wa (69 
Py = fAn—— anh 7, (66) 
P, = KLB* 1+ adALp*n + An Lp**+*7, (67) } 
x = StAt aes, os) | 
P= Pains OE, (69) | 
pl, = AuL fn + aAL Bn + KLP**2n, (70) 
ae ~ Br, (71) 
m’ = agra t Ba, (72) | 
I2tp, = ar ae 7, (73) 
Lap, = BX oa + Bn, (74) 
ever = OB (75) 


REFERENCE 
HaraGut, (1957). Queueing with balking. Biometrika, 44, 360-69. 
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MOMENT ESTIMATORS AND MAXIMUM LIKELIHOOD 


By L. R. SHENTON 
College of Science and Technology, Manchester 


1. Let P(x; 0,,0.,...,0,) be the probability of the variate x, depending on the h para- 
meters 0;(j7 = 1, 2,...,h). It is assumed that P(x; 0) possesses first derivatives with respect 
to the 0;, and that the moments yw; and their first derivatives exist and are finite. Let 
{q,(a)} be the orthogonal system of polynomials associated with P(x; @), and write 


q,(x) = ¥ asa! (4,,+ 0, r= 0,1,...), (1) 
j=0 
where [2e(e) P(e; 0) dx = 4, 
[asee) qe) Pees8)de = 0 (rs). (2) 


To avoid undue complication at this stage we assume P(x; @) is continuous throughout its 
range. We reconsider the restrictions on P in a subsequent section. 
Now let the formal ‘Fourier’ expansions of the derivatives of P be 


aP (x; 0) 








aie P(x; 9) {Ajo Qo(7?) + Apr Q(4) +---} (9 = 1,2,..., 4). (3) 
¥] 
For the partial sum of the series in (3) we write 

AP(x) = 2 A jn Q(2), (4a) 

where Aino = [ante a :9) - dx 

09k (x) 
a + 

- [iP Pe: ae (40) 


if the range is independent of @ (which will be assumed throughout the subsequent develop- 
ment). We consider ./{(x) as an economized (or Tchebycheffian) polynomial approximation 
to dlog P/00;, for a given value of r. This being the case, we may estimate the parameters 0; 
from the system of equations* 

AY = SAP(x)/N=0 (j =1,2,...,h), (5) 
where the summation is over a random sample of V. If d log P/00,; is a polynomial in «, then 
for a determined r, the corresponding equation in (5) will be the associated likelihood equa- 
tion. Again under certain conditions (5), when r—>oo, will become equivalent to the likeli- 
hood equations. 

Our main objective here is to determine cov sys 6,,) for large samples for a solution 
(4,0 = m of (5). Since &q, = 0, and from (4b), Aj) = 0, we may say that the estimators 
are cntiliitinl (a modified definition of a consistent estimator or statistics has recently been 
given by Fisher (1956, p. 144). 

* See expression (13) of my 1950 paper (to be referred to as M.L.). 


AES 
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2-1. Now consider small variations in b;, which from (5) satisfy 


hi. awl 
00,..6 
Poa ” 06, 


Moment estimators and maximum likelihood 





F485, =0, (6) 


kr 


where the second member refers to fluctuations in the sample moments and 


r k 
bn AY — a A ie tii IMip (7) 
where m; = Si/N. But using (40) 


(32) - ESA 09,(2) 


OO» x=0 ” 06,, 
¢ 
= — 2 Andra Pr 
= — (9k) = — (jo (8) 
We now write (6) in matrix form 


(9,4 ? Yer ) {00} = {on Af} , 
where {u} is a column vector and in particular {60} = {60,, 60, ...,60,}. It now follows that 





{89} {80}’ = (9, Be) Bin LP} {Bn LO}! ((le jon); (9a) 
where {u}’ is the transposed of {u}. Taking expected values we have 
E{60} {80}' = ((j, b\w) 1/N, (96) 
provided it can be proved that 
E (Sn LP) (Sm LP) = (J, k)w/N. (10) 
First of all E(B, LP 8m!) = 6x A pS 8m!,dm! 


r k 
= F Age X MalHare—Lalts)/N, 
k=0 ~ A=0 
after using the well-known expression for cov (m4, m;) and 4, = &m’,. Hence 


¢ 
F (Sm 07d) = NS Ase {dele) fa" — 13} 


e 
= NE > Aj, 2°q,(x), since Aj = 0. 
k=0 
But Sq,(x)/N is a linear function of the sample moments, so 


. k 
Eb, AS 8, A) =N"€é 2 Aid) 2D Ain 12) 
from which (10) follows. ‘ 
2-2. Returning to (95) we find expressions for the asymptotic covariances as follows: 


N var 6, = A /A, 
N cov (5,,9;,) = AG?/A®, 


and in general N cov ( yrs b.,) = = AQ/A®, (11) 


where A‘? is the cofactor of (j,k), in the matrix ((j, k);,) whose determinant is A”. 


ex 





ma ee 


(6) 


(7) 


(11) 
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It is now of some interest to notice that if roo in (11), and we assume that the various 
expansions involved remain valid, then (11) gives the usual result for the covariance of 
maximum likelihood estimators in terms of weighted cofactors of 


dlog P dlog P 
(J (“as “ee, ) Pa): 

It may be noted that there is a parallel line of thought in Geary’s (1942) proof, that the 
generalized variance (asymptotically) is a minimum for maximum likelihood estimators of 
the parameters of a population which can be described in terms of frequency compartments. 
Again it is instructive to compare the basic ideas with those which arise in curvilinear 
regression when the independent variable is discrete. In the latter case the residual variance 
decreases (in general) with each parameter introduced and finally (after using a finite 
number of parameters) becomes zero. Reference may be made to Aitken (1935, 1945). On 
the other hand, with a moment estimator, the variance decreases and reaches a limiting 
value only after including an infinite number of terms in the likelihood equations. 

2-3. To put the matter in relief, suppose P depends on one parameter @ only, then with 
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A,$, = - [Pe: 6) -_ dx, (12a) 
a T 
#, is a solution of S ¥ A,q,(x)/N = 0 (126) 
s=0 
with large sample variance given by 
(N var6,)-1 = A?¢, + A?¢,+...+A2¢, (12c) 


and, assuming the validity of the expansion, asymptotic efficiency given by 


» Atd,+A3d.+...+A2d 
BE, Gx — 223° SER ee 12 
i A2d, + Azgdot... +0 (12d) 


It is clear from (12c) that we also have (for samples of 1) 
var 6, > var 6, > var(,>.... (12e) 


3. The property indicated in (12e) for a single parameter can be generalized. For it is 
clear from the expansion for the generalized variance determinant given in (16) of M.L. that 
A® is non-decreasing (considered as a function of r). One would expect the same sort of 
property to hold for var 6... For simplicity consider 


(N var 6,,)-? = A+ A. 
Now it may be proved that 


(1, 1) (1, 2) vale (1,2),) P 
(2, 1), (2, 2)in eee (2, h)oy 
AMALFY —AC+HDAM, = 6,44 : : : ; (13) 
(A—1, 1)y) (h—1, 2)qy eee (h—1, 4M) 
Ay rat As 41 eee An rst 








This result is achieved by writing 


(j; k)n4.1) = (j; k)\y + Aj rt An, ria Prev 
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var 0;,>vard;,> vardj,>... (jf = 1,2,...,h), 


and this holds for large samples of NV. 


illustrations we mention: (a) the Poisson distribution 
P= e™m*(1—e-*) 4/2! (x = 1,2,...), 


=0, otherwise 





' 0 x 1 
with ayy 108 F “m= i-e®* 
so that A,=0 (s=0,2,3,...), 

A, +0, 
Clog P, _ moe 
and S on a4, ’ 


which leads to the likelihood equation for m, 
(6) the distribution with probability density 
P, = e-*{2—a+(a—1)2} (0<u<a, l<a<2), 

with Yo(x) = 1, ¢, = 1, Ag dy = 0; 

Q(z)=x—-a, $,= —a?+4a-2, A,g,=1; 

qo(x) = (—a2 + 4a—2) a? + (4a2— 20a + 12) a+ 202+ 4a—4, 

gy = 4(—a? + 4a— 2) (—4a* + 15a?— 12a + 2), 

A,¢, = 4(1—a), and so on. 

For the ‘linear’ moment estimator a,, 
AY = A,(m,—4a,) = 0 
and (N vara,)— = A?¢, = (—a*+4a—2)-1. 
For the ‘quadratic’ moment estimator dy, 
AP = 0= 4a? + a,(m;— 8m) —1)—m,+7mi—2 = 0 


and (N vara,)-! = rat 12a43" 
Similarly, the ‘cubic’ moment estimator is given by 
AP = 15a} — ad(ms — 15m), + 57m‘, + 2) + a4(2mi — 29m), + 100m’, — 28) 
—ms+ 14m, — 44m, + 16 = 0, 
15a — 20a + 6 


and for the variance N vara,)—! = ‘ 
( a) — 15at + 56a3 — 48a? + 8 





and then introducing the bordered forms of the determinants involving these terms, finally 
appealing to a pivotal condensation identity (see, for example, Aitken (1946), p. 49). Hence 


(14) 


4, Aselementary examples of moment estimators we mention that for a Poisson, Binomial 
(assuming the index is known) and Normal distribution the form of 0 log P,,/00 is a poly- 
nomial, so that ..%) = 0 merely gives the usual maximum likelihood estimators. As further 


(15) 


(16) 


(17) 
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The variances of the successive estimators converge fairly rapidly to a limiting value, and 
for large r 
4r 2-—a (2—a)r 
ee =: eee. Poe 2 ee eee 
N( var a,.,.;) (N vara,) aaa /( aay | )}. 
We intend to make further use of this illustrative example on another occasion. 

5-1. Application to the negative binomial distribution. Various methods of estimating the 
parameters have been considered by Anscombe (1950), and for three of them he gave some 
contours of large sample efficiency (see also Evans (1956) and Haldane (1941)). We consider 
the probability function 


a a(a+1)...(a+a—1) (Aa)* 


ms a! (A+a)*t+* 


zx 





(a,A>0, x = 0,1,...). (18) 


The orthogonal polynomials (Aitken & Gonin, 1935) are given by 











q,(a) = (1—AA,/a) 247-12, (18a) 
where A, f(z) =f(e«+1)-—f(x), 2 =a(a—1)...(u—r+1), 
and there are the additional properties 
Eq(x) = A(A+a) (a+r—1)Mr!/a* (185) 
= Pr» 
s—1l v 
Ex = 8 Il (: = *) = [My)- (18c) 
v=1 
' oP, 
Moreover if 2 PX A,4q,(%), (19a) 
aP, _ 
OA a P= B,q,(2), 
_ _ Ap 4.(%) __ Sp 4(2) 
then A,¢, = - ZF, _ B,¢, = Pre aL (19d) 
0q,(X) 2 » 
But —* As(s +a —1)q,_4(x)/o? + {In (1 —AA,/a)} q,(2) (20) 
ue As(s an 1) Ys—1() + A?’s(s — a) Ys—2(x)/200? os S89 
= (—1)8+1)A8(s—1)!/as ae 
and A,, = (—1)*t1A%(s—1)!/a* (s>1) } (21) 
=0 (s=0Oor 1). 
Similarly, using B,d, = S (1+ (s— 1)/a) 8q,_,(x), 
ean (22) 
B.g,=1 (8=1); B,=90 (s#1). 
dlog P,, 0 log P, 
pica. ee pe iS 2 
Clearly from (21) and (22) é( A AA 0, (23) 


so that @ and A are asymptotically uncorrelated (this was pointed out by Anscombe (1950) ; 
we mention in this connexion that it is also possible to choose asymptotically uncor- 
related estimates for a Neyman Type A distribution with two parameters). 
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5-2. The truncated likelihood equations are (cf. (23) of M.L.) 
Sq,(x)/N = 0, (24a) 


a*qo(2) aFqs(2) a4q,(22) 


" (sag a)? (a +1) 3(A+a)3(a+2)9* 4A +a)*(a+3)9 °° 


(—a)"q,(x) ove 
+ rere i) =0 (246) 
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with ‘solution’ A = Sz/N = sa (25) 


ah = Asy_g- 
Considered as an equation in a it may be shown that (245) after simplification is of degree 


2r—3, and in particular for r = 2 it corresponds to the ‘moment’ estimator. We have been 
unable to prove anything about the existence of solutions in the general case. 
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2 ae 
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Fig. 1. 98% efficiency contours for three estimators of « for the negative binomial. 


For the variance of «,,_, we find (cf. M.L. (23), and Fisher (1941)) 
(N var a,_3)~! = k(ug+Ugt+...+4,), (26) 
where ke = A2/{2(A + x)? (a + 1)?}, 
tu, = 2A8-*(s— 1)!/{s(A +x)? (a +8 — 1)6-2}, 
and the asymptotic efficiency of «,,_, is 
Ug + Ug t...+U, 
Eff. (%,_3) = Sg tig tothe (27) 
An indication of the efficiency for the ‘linear’, ‘cubic’ and ‘quintic’ estimators of « is 
given in Figs. 1 and 2, showing respectively the 98 and 90% contours. There is obviously 
a considerable gain if we use the ‘cubic’ as against the ‘linear’ estimator; the gain for the 
‘quintic’ as against the ‘cubic’ is not so marked, as one might expect. Of course the efficiency 
improves for any of the estimators as « increases (A fixed) for this corresponds to an approach 
to the Poisson distribution with mean A. 
5-3. From a practical point of view it is tedious to compute a root of (245) if the degree is 
high, although one may be provided with a fairly good start (using an iterative process) from 
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the previous equation. To give some idea of the ‘ weight’ of computation involved we remark 
that (using N-1S2) = mg) 

N~*Sqo(x) = mg — (1 +71) me, 

N-*Sqg(x) = mg) — 3(1 + 2c) may Mq) + 2(1 + 2a) (1 +2) Mey 
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(28) 
N-'Sqq(x) = my — 4(1 + 300) maya + 6(1 + 3a) (1 + 2a?) Mey, 
— 3(1+ 3a-1) (1+ 20-1) (1+a71) my, 
11 
10 
9 
8 — 
7 
3 we 
6 
x 5 
ile 
3 — Cubic 
= S 
0 
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 
A mean 
Fig. 2. 90% efficiency contours for three estimators of « for the negative binomial. 
so that the truncated likelihood equations (245) are: 
1 
Linear A(a,)= ¥ A,a{ = 0, 
s=0 
(29a) 
A, = Ma) — Mi), Ay = — my; i 
(N var a,)—! = kug. 
3 
Cubic B(as)= ¥ B,a§ = 0, 
s=0 
By = Mq— Mey,  3By = Iq mq — 2m — TMey + 6M — IMAy, ¢ (295) 
By = 6mqymMy— TMi — 2m, 3By = — 14mey; 
(N var a,)-? = k(uy+ Us). s 
Quintic Clas) = ¥ Cae = 0, 
s=0 
Cs = My — Mey, 
30, =— 23) + 12mq)M.) _ 10mé) + 15m) - 18me), 
6C, = 3mq) — 16M Mg) + 36MM) — 23M) — 12mg + 120M) Me) 
(29c) 


— 120m, + 36m) — 66m?,), { 
3C, = 90m2) Mg) — 24M) Mg) — 69MA) + 724) — 110M%) — 18mFy, 
6C, = 216mz) mM) — 253m4, — 120m%,), 

Cy = — 23mé); 





(N var a;)-! = k(ug+ Us + U4). 
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If we substitute in (29) the population value of « and neglect terms involving N—! and higher 
powers, it may be verified that 


€A(a) = €B(a) = €C(a) = 0. (30) 
5-4. We now consider briefly two examples: 


(i) Fisher (1941) in discussing the negative binomial distribution gives two examples 
based on tick data (p. 186). For his second example some results are summarized in the form: 


N = 82 
—_—oC A = 6-560,9756 
r Xorg O(%o_3) 
2 = 1-549 3-48/,/N 
3 1-524 3-04/,/N 
4 1-592 2°91//N 
co 1-778 2-80/,/N 


(ii) Haldane (1941) and quoted in M.L. p. 116: 


N = 1096 

r Xor_g O(Xgp_3)* 
2 10-385 83-6/,/N 
3 9-918 82-8/,/N 
4 9-906 82-8/,/N 
co 9-900 82-8/,/N 


* Calculated for the values A = 2-157, « = 10. 

The efficiency for «, in (i) is only 64 % while for (ii) it is 98%. 

6. Remarks on the validity of the expansions. 

6-1. Wenow consider under what conditions it is possible to assert that Af, + A> N var 0, 
as r—> 00, where 0, is the maximum likelihood estimator (and similarly for the other 
covariances). Basically this reduces to the validity of 

Olog P(x; 0) dlog P(x;9)\_ 
fe i ee (3 
which in turn depends on that of 
. 2 o 
ge EON = 3 Aad (32) 
00; A=0 


I 





Thus we haveto consider the conditions under which Parseval’s formula holds for (1/P) (6P/00) 
with respect to the weight function P(x;0). Clearly we must assume the existence of the 
moment sequence {;} so that there is a tie-up with the problem of moments and a property 
of the solutions due to M. Riesz (see, for example, Shohat & Tamarkin (1943, pp. 61-6 and 
Theorem 2-20)). Riesz has shown that if there is a unique (x) (or what is called an extremal 
solution) such that 


[P wane «% 6864.1 


then Parseval’s formula, in {f(x)} dy(x) = > 7, (33) 
— s=0 


where f= [" fla)aterayte) 





so 


her 
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holds for every f(x) of the class Li, (this implies the measurability of f(x) with respect to y(x) 


and the convergence of | {f(x)? ay(e)) . Thus in our application it is certainly necessary 


for the Stieltjes integral appearing as the first member of (32) to converge. 

6-2. We now mention briefly various criteria for deciding whether the moment problem is 
determined (see Shohat & Tamarkin (1943), pp. vii—viii, 19-22); also Kendall (1943, 
pp. 106-10)). For the Stieltjes moment problem (variate range 0 to 00; w(x) constant 
elsewhere) there is a unique solution, if 

edy(z) 1 1 ; 3 


— — ———o06 34 
0 L+Z AyZ+ Agt AgZ+ Agt+ Asz+ Agt (34) 








is such that a,>0 and Xa, diverges. A difficulty here is that it may not be possible to find 
a comparatively simple expression for a,, as is the case, for example, with Neyman’s type A 
with two parameters. But for Poisson moments (mean m) it may be shown that a,,.., = m/s! 


and a,, = (s—1)!/m* so that Sa, = expm+m- ¥ s!/m8 which clearly diverges. Similarly, 
0 0 


for the negative binomial (18) it appears that 











> p a _ 1 A (L+a A) WAL +07) (L4+ Aa) 2! A(1 + 20) (35) 
eaotte 2z+1+ z+ 1+ 2+ 1+ 4 


so that after an equivalence transformation we find 


co) io) —1))\s hed ATL (1+ ra) 
ee, «* _sl(1 atl - +3 =... a > (36) 


! -1 
s=0 Oe TT (1+ ra) s=0 si(l+a A) 


which diverges since a term of the first infinite series is O(1/A). Hence if we assume the 
stochastic convergence of a,,_, to the maximum likelihood estimator then (26) remains 
valid as r > 00. 

Wealsonote here that the normal distribution moments uniquely determine a distribution, 
as may be verified by an appeal to Carleman’s series test (Kendall, 1943, p. 109). 

6-3. It is also possible to say something about Gram-Charlier distributions consisting of 
a finite number of terms. Thus for a Type A series based on a normal or Type III probability 
density, Parseval’s theorem holds provided the frequency is always positive (see, forexample, 
Shenton (1954, pp. 80—2)). Thus the expansions given by me in the efficiency of the method of 
moments and the Gram-Charlier Type A distribution (1951) converge with certain para- 
meter restrictions. 

For example, for the maximum likelihood estimator 4, of a, in 


P(x; a4) = C(x) 9(), 
where O(a) = 1+ H,(x)a,/4!, g(x) = exp (— 42?)/,/(27), 
2=(X-—m)/o, H,(x)=Hermite polynomial of degree 4, 


it turns out that if o is known, 


ue) 
N Wvard,* 1=[" # on ais 
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Parseval’s expansion holds and is equivalent to the convergence (as r -> 00) to zero of 


oO r 2 
[7° 902) 00) {1/0(@)- Aygo) de (38) 


with 0 <a,< 4, where {q,(x)} is the orthogonal system with respect to g(x) C(x) (see the 1951 
paper, (35) and (36)). 

Similarly, if y(a) is a determined solution of a Stieltjes moment problem, then it may be 
shown that the various Parseval expansions which arise in connexion with the estimation 
of the parameters of a Grarmm-Charlier Type B development (consisting of a finite number of 
terms) expressible in the form 


P(x) = [om dy(t), (39) 


where z(t) is a non-negative polynomial, converge (Shenton, 1957, pp. 153-6). Thus, in 
view of the remarks in § 6-2, convergence questions arising in connexion with the estimation 
of parameters in Gram-Charlier Type B based on the Poisson and Negative Binomial (or 
geometric) distributions are readily settled. 


7. Concluding remarks. The large sample moment estimators we have introduced here 
have the unusual property that although they involve higher moments this does not imply 
larger sampling variances; on the contrary the sampling variances decrease as higher moments 
are introduced. As far as we are aware examples of this sort of behaviour are rarely met in the 
literature. It may turn out that there is a similar property for the covariances. It must be 
mentioned, however, that the property of the variances might have been anticipated when 
it is recalled that the estimators (under certain conditions) ultimately converge to maximum 
likelihood estimators. 

Our treatment here has been mainly formal and general, and covers discrete and con- 
tinuous distributions. We reserve for another occasion a discussion of the formula for the 
covariance matrix (and its relation to the derivations given by earlier writers) and remarks 
on the distributions of the moment estimators. 


I have to thank Mr A. Fletcher for drawing Figs. 1 and 2 and for assisting in some of the 
computations. 
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EFFECT OF NON-NORMALITY ON THE POWER 
FUNCTION OF i-TEST 


By A. B. L. SRIVASTAVA 
Statistical Laboratory, Indian Institute of Technology, Kharagpur 


1. INTRODUCTION 


Student’s t-statistic provides a suitable test of significance for the mean when the sample 
comes from a normal population. The power of the normal theory test has been studied by 
Neyman (1935); Neyman & Tokarska (1936) and Johnson & Welch (1939). As in many 
cases the samples appear to belong to populations other than the Gaussian, it is necessary 
to see how far the normal theory test can be assumed to be valid in controlling the Type I 
and Type II errors of inference on non-normal samples. The effect of non-normality on the 
Type I error of Student’s t-test was studied experimentally by Pearson & Adyanthaya (1929) 
and theoretically by Bartlett (1935), Geary (1936, 1947) and Gayen (1949). Assuming the 
parent population to be specified by the first two terms of the Edgeworth series, Geary 
(1936) obtained the approximate distribution of ¢ for any sample size, and later this work 
was extended by Gayen (1949) by including in the distribution the effects of parental 
kurtosis A, = £,—3 and A? = £,. Apart from the pioneer empirical study by Pearson & 
Adyanthaya (1929, pp. 276-80), the effect of non-normality on the Type IT error (and hence 
on the power) of the t-test was first studied by Ghurye (1949). He has, however, considered 
only the effect of skewness of the population, for he started with the joint distribution of the 
mean and the variance for populations specified by the first two terms of the Edgeworth 
series (Geary, 1936). In this paper, it has been possible to study the effects of kurtosis and 
skewness of the parent population which may be assumed to cover a larger range of non- 
normality. Gayen’s (1949) formulae for the joint distribution of the sample mean and 
variance for the first four terms of the Edgeworth population have been utilized for the 
derivation of the corrective terms of the power function. 

The method followed by Ghurye for the evaluation of the corrective term of the power 
function due to A, appears to be satisfactory for derivation of the effects due to higher 
odd-order cumulants. But for those of the even-order cumulants his method does not appear 
to be useful, as Ghurye himself encountered some ‘analytic difficulties’. In this paper, by 
a different approach it has been possible to evaluate integrals involved in the power function 
due to A, and Ai. 

The non-normal population considered here is supposed to be characterized by non-zero 
values of the standardized third and fourth cumulants. Since the effects of the higher-order 
terms depending on A;j, Ag, A3A4, A2, ... are assumed to be negligible, the population covered 
is only moderately non-normal. Too high values of A, and A, can also not be permitted as 
they will make f(x) negative at one or both tails, and will give rise to subsidiary modes. 
To ensure a positive definite, unimodal frequency function, A, should lie roughly between 
0 and 2-4 and A2< 0-2 (Barton & Dennis, 1952).+ 

Also it is found possible in this paper to calculate in the non-normal case the critical region 


+ I am grateful to Prof. E. 8. Pearson for his kindly drawing my attention to this work. 


27 Biom. 45 
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corresponding to any fixed probability level, to a sufficient degree of accuracy by using 
the Cornish—Fisher (1937) technique. 


2. DERIVATION OF THE EXPRESSIONS 


Let the variate € have mean yj, standard deviation o and the third and fourth cumulants 
Kz = A,o* and k, = A,o*, respectively. Assume all the higher cumulants to be zero, so that, 
to the third approximation of the law of error, the frequency function of £ is 
f—y =) e As (3)(- Ay (4)( vj 1(6) 
§ (PB) = se) = Gey — 24 %e) + 4. 9x) + 3B Ge, (1) 
where (x) = 1/,/(277) e-4*, and $x) denotes the vth derivative of 4(z). 

Let us take a sample of m independent values £,, £5, ...,£,, from this population. Now if 
a, = (€;—p)/o (t= 1,2,...,n), the joint distribution of = Xx,/n and s = {X(x;—%)?/n}4 
can be obtained from Gayen’s (1949) formula (2-11). It is found to be 

nin 


7 i n— 247 nas r a om 
g(%, 8) = rian — 1] (2m) Sa 2exp {— 4n(s? + 2)} +> {3 — 3% + 383} 


na : 
+ vat |B 6a + 3 + 6s — 1) + ———_—* 8 


nx® — 3(2n + 3) F*+ 9(n + 4) Z2 — 15 + 652(n¥4 — 3(n + 3) Z2 + 6) 


fe =wvn—t) 6n(n — 2) : : 
+ 9s4 (nz*- “Sat ) + as | ‘ (2) 

Suppose we have to test the hypothesis Hy (“ = 9) or Hy (u < wo) against the set of alter- 
natives “> /4). We know that for such a set of one-sided alternatives Student’s t-test is 
uniformly most powerful; therefore we find the value of Student’s ratio t, say t), such that 
the probability of exceeding this value is a predetermined «. Our test procedure will then 
be to reject the hypothesis H, if the observed value of (& — 19) /(m —1)/(so) exceeds ty, and 
accept it otherwise. Now if the hypothesis H, is not true, and ~ has some alternative value 
/44, we shail determine what probability the test has of rejecting H, in this situation. 
This probability gives the power of the test for the particular alternative. The comple- 
mentary probability will give the probability of accepting H, when ~ = ,, which is the 
error of the second kind. This will be obtained by integrating (2) over the region of accept- 
ance, i.e. the domain bounded by 


(E— Mo) V(n—1)/(80) <ty, ie. B< Tet St =m say, 


where Pn = (fy — Mo) Vn/0. 

It may be pointed out here that if the value of ¢, corresponding to the level of significance 
a, is taken from the usual t-tables, the actual probability of error of the first kind will not 
be «. However, if we use such a value of t, the necessary correction to « due to non-normality 
can be obtained from the results given in Gayen’s (1949) paper. It will be possible to obtain 
similar corrections to the probability of error of the second kind from the results to be 
given here. To obtain the value of t,, which would correspond to a predetermined level of 
significance a in the non-normal case, we have to take recourse to the well-known Cornish- 
Fisher (1937) technique. 
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We can write the probability of the error of the second kind as 


Zo 
Pix(to, 2-1, Pp) = am g(%, 8) dz ds 
s=0 JZ=-—0@ 
= Py(to,n—1, pp) +A3P,,(to, n—1,p,) 
+APy (ty, n— 1, Py) +AZPy2(lo, n— 1, Pn): (3) 


In this, P(t, —1,p,) is the probability corresponding to the normal population, which 
has been dealt in the papers of Neyman (1935), Neyman & Tokarska (1936) and others. 
The power function is given by the expression 1—Pi,(t),n—1,p,) of which the term 
1—Fy(to,n—1,p,) is the ‘normal theory’ power, and — A; Py, —A,P,,— Aj Py; are the correc- 
tion terms for non-normality. 

We shall now derive the expressions for P,,, P,,and P,: considering each of them separately. 
In what follows, we shall write 


te Prt nen 
a =1+— —-, 6=__ 2. A= : ; 
=n a/(n—1) D[4(n — 1)] J (27) 28 


First of all, we have 


n—1)!e~ Spp/a* 
Py,(to, 2 — 1, Pn) = > ) 


~ 6L[4(n— 1) I v( nm) Qhn—2) qntl 
2 . 
jmta* +2) Hh,(—b)—2a*bHh,_,(—b) + —“— (p—3n +2) Hhy_o( | — 





where Hh,(—b) = f° 7 o-Ho- dy, 
0 v} 


This is Ghurye’s (1949) result, expressed in terms of the well-known Hh-function, tables of 
which have been provided by Airey (1931). 
We now proceed to derive P,,, which is given by 


ye ra ae 3 
Py (to.n—1,p,) = - ral e~hn(s*+29) gn—2 cca - 2n + 2ns?) | ds 


n(n—1)A, a e—gns? n—2 kal 5 2s? me" | 
PMD Meaney al 2 OM ova 


=I[,+1,, say, (5) 
where D(x) = a ein? dz, 


On substituting for x, in J, and reducing, we find 
(n—1)!e-tenla? 4 _— 1) (a2+5) 


to 
24ndngnt2 V(n—1) Hhy+(—6) 


I,=- 





denice re 3a*t, 2 
— Ba(a? +1) py Hy —B)+ 5G 99 (P20 + 1) Aig a( —) 
8 ee ‘ 2 
= n Hh,_»(—6)|. 6 
The second integral J, can be written as 
_n(n—1)A,f 1 as F ‘|. . 
I, ie 8) one lex 1) Ye) = y+ Vn (7) 


bn = I e~tns? gmdb(xq) ds. 
0 
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On changing the order of integration, after suitable transformation of variables in y,,, 


we find 
e~ —hpi 


~ J@—1) [, [Pexpt—anta +e2/m—1)}5*+ 519, Yn/(n—1)] 9" deat 





-tpz n *) ig r (oa) 
Far v | (Pn inv | exp [ vt dn{l ae t?/(n oe 1)} s?] gmtr+1 ds dt 














~ V(m=1) 20 —or!(n—1)3 Jo 

_ oth 5 att on TiMm+r+2)) (* _ [tly(m—Vrdt__ 
V(n —1),20 rl nmr —o[1+#/(n—-1 )]Bemtr+2) 
_ eben bm P[R(m+1)] 2 Arp, TH+ 1)] . r+1m+l1 


where wy = é§/(t§+n—1) and I,,(p,q) denotes the incomplete /-function. 
On substituting for y,,., y,, and y,,_». in J, and using the recurrence relations of the 
incomplete £-function, we get 
(m— 1) e~bhu§(1 — ug)" An S (Py a/to)” 24r-™ 
— 8nt"+1(n +1) r=0 r! 
(nm — 1) e-benla® (ny — 1)!t)A, 


. * [hal a aS 0] ; (10) 
Snrgnt2 a(n um 1) n 


i= 





P[3(n+7)] [(m +1) (1— uo) — (m+ 1)] 








since n! Hh,(—6) can be expanded as 
d(n+r—-1) 
e~ tb? 5 ent fal P[d(n+r+1)]. 


r=0 


Substituting the formulae (6) and (10) for J, and J, we finally get 








— 1)! e—der/a® 
Frlor™— 1s Pn) = sai - = a a aint 1) a?+2(n+4)} Hh,,,(—5) 
+ Bala? +1) py Hhy( —B)— (oh —m) Hy 3(—B) 
wey et 6n+3) Hhy_o(—B)]. (11) 


Lastly, we derive the expression for P,; in the same way as that for P,,. We have after 
some simplification as in the case of P,,, the expression 


Paltys2—1, Pp) << 
2(to,r—1,p,) = — -—— 
— “ 720 [h(n — 1)] 28" Jr an+4 


NaF ing(—6) + (m+ 1) (m+ 2) (a? + 2) (5a? — 2) 


x apy Hh,,.o( —b) +{-2(n +1) page 2(n +1) (3n+ 2) a? 





[{- ims 1) (n+ 2) (n +3) (a? + 2)? 





+4(n-+1) (3 +8)—12(n—2)}—" iG 2) Miya —b) + {2(5a*— 2) p 

— 6(3n + 2) a® — 24} a’p,, Hh,,( ot 5p! + 6(3n + 2) p? — 3n(3n + 4)} 
att, 

* aa 1) Final —b) + {p4 — 2(3n + 2) p? + 3(3n? + 6n — 4)} 
apy 

x a 1) Binal -0)). (12) 








Ym 


(9) 


the 


(10) 


(12) 
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3. TABULATION OF POWER FUNCTIONS 


Using the expressions obtained above, some values of P,,, P,, and P,; have been calculated 
and tabulated in Table 1, for « = 0-05, n = 5, 10, 20 and integral values of p,, from 1 to 5. 
The use of recurrence relation of the Hh-functions viz. 


nHh,,(—b) = bHh,,_,(—b)+Hh,,_2(—b) 


to reduce (11) and (12) to expressions involving only Hh,,_,(—6) and Hh,,_.(—6), was found 
convenient for tabulation purposes. The values of t, used were obtained from the usual 
t-tables corresponding to the upper tail area equal to 0-05. The accuracy of the results is 
conditioned by that of the values of ¢, obtained from the table. The results are expected 
to be correct to four places of decimals. 


Table 1. Giving the values of P,,, P\, and Pz for a = 0-05 


















































| Py, Py, Py: Pa, Py, Pr: 
n | ee ee ee — 
| Pnr=9 | Pr=l 
5 | 00343 | 0-0030 | —0-0140 0-0075 | —0-0164 | 0-0165 
10 | -0297 -0009 | — -0080 -0628 — :0137 0161 
20 | 0229 | 0002S — -0042 -0450 — -0088 | -0104 
Pn =2 Pn “: 3 
| 
— —— 
i war | — 
5 | 0-:0361 —0-0380 | 00438 | — 0-0559 —0-0131 —0-0149 
10 0046 | — :0207 | -0180 — :0597 | 0024 = — :0204 
20 =| — -0048 | — 0099 -0066 — 0445 | 0030 — -0110 
| Pr=4 Pr=d 
| | 
ss om, | — a - eee SS — | 
5 —0:0649 | 0:0143 —0:0281 — 0:0260 00112 | 0:0007 | 
10 — -0340 -0089 — -0048 — 0062 | -0025 -0039 


20 — 0191 -0040 — :0003 — 0025 | -0007 -0016 








4, DETERMINATION OF THE CRITICAL REGION CORRESPONDING 
TO A PREDETERMINED LEVEL OF SIGNIFICANCE 


We know from the paper of Cornish & Fisher (1937) that if z, be a deviate corresponding to 
some assigned probability level, say «, and has a distribution with known cumulants 
Ky, Ko, ... and x, be the normal deviate for the same probability level, then 


ey — Ky 


a i > .. ae __*3 (978 _5 13 
= Hat Gg 1) Bang 32:,) Kg (Pe a) + «0 (13) 


VKe 
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Now the cumulants of the ¢-distribution based on samples of size n for our present non- 
normal population, up to O(n-*), derived from the formulae given by Geary (1947), are 


as follows: r 3) 
wodehed), 
2,/n 4n 


2 2 
ky = 14+= (1+ G9) + (8-Ay— sed), 


1 3 
Ky = = (6— 204+ 1203) + 5 (18 — 6A, + 2523). 


If we denote by ¢,,, the percentage point of the above t-distribution, and write y, = K3/«i 
and y, = K,/«3, then retaining the first four terms of (13) we get 


t.—« 
a a i = vy + aY1(23 et 1) + Fx 2(x3 aT 3x,) ates FeV i(2xd pe 5a,). (14) 





Also, for n > 5, we have for the cumuiants of the ordinary ¢-distribution with n — 1 degrees 


of freedom, Ese 8! ee - _8(n— 1) 
ae a3’ (a —38) (a —5)° 


Hence, if t, be 100a °% point of the ¢-distribution, we shall have from (13), for n>5 





te, a 1 - 
Je 1y ln 3y} — ** aay 8) “ 
Now from (14) and (15), we get 
& ay = ey +7 (a2—1) a (x3 — 32,) — a (203 — 52,). (16) 


Using (16), it is found that the upper 5 % point of the ¢-distribution for n — 1 = 9 degrees 
of freedom, corresponding to a non-normal population having A, = 0-6, A, = 0-4 is 1-6266. 
On actual integration it gives « = 0-0509. This shows that the above method yields results 
to a sufficient degree of accuracy. 


5. DISCUSSION OF RESULTS 


Ghurye (1949) has provided some illustrative examples to show the effect of A, on power 
corresponding to significance levels « = 0-01 and a = 0-05. Our tabled values of P,, in 
Table 1 for « = 0-05 being calculated from a basically identical formula, are in agreement 
with his results. So far as A, is concerned, we thus arrive at the same conclusions. 

It appears from a comparative study of P,,, P,, and P,j of Table 1 that the effect of A, 
on power of the ¢-test is, in general, greater than that of A, or AZ, but to study the actual 
contributions of A,, A, and A3 we shall consider some power curves for different non-normal 
populations sampled. 

The values of the power function are shown in Table 2 below for a sample of size 10 from 
a few non-normal populations specified by different values of A, and A, mostly within 
Barton & Dennis (1952) limits, so that the frequency functions are positive definite and 
unimodal. The critical regions are erroneously given by the upper 5 % value of the ordinary 





t-d 
du 
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non- t-distribution. The exact probability of the Type I error, as obtained on applying corrections 
are due to non-normality are those shown against p,, = 0 in each case. 


Table 2. Giving the values of power for different non-normal 
populations when n = 10 and a = 0-05 


























‘ | 
Ay | Pn = je ee - 4 
| -06 | -04 | -02 00 | 02 0-4 0-6 | 
| | 
ce Ma ea a REN Bea 
—1-0 0 0-072 0-064 0-057 0-051 | 0-045 0-040 0-036 | 
K,/Ki 1 254 245 234 | -222 209 195 179 | 
2 556 559 559-559 558 555 550 | 
3 842 850 859 | -870 | -883 | -898 914 | 
| 4 -969 ‘975 | -981 | -988 -995 — | —_— | 
(14) | errs | | | | | | 
| 00 | 0 | 0-071 | 0-063 | 0-056 | 0-050 | 0-044 | 0-039 | 0-035 | 
rees 1 268 259 -248 236 | -223 -208 | -193 | 
2 576 579 -580 -580 578 | 575 571 | 
| 3 | -839 | -847 857 | -868 | -881 | -895 | -911 | 
4-960 966 972 | -979 | -986 | -993 | | 
| | | 
| 1-0 0 | 0-070 0-062 0-055 | 0-049 0-043 0-038 | 0-034 | 
| 1 | +282 272 | 262 | 250 | +237 | +222 | -206 
saa | | 2 597 600 | -601 |  -601 | 599 | +596 | -591 | 
(15) | 3 837 | 845 | 854 | 800 | 878 | 893 | 009 | 
| 4 951 | -957 | -963 970 | -977 | -985 | -992 | 
\ | | | 
| | | 
| 20 | O 0:069 | 0-061 | 0-054 | 0-048 0-043 | 0-038 | 0-033 | 
(16) 1 295 | -286 | 275 +263 *250 | 236 220 | 
| 2 618 -620 | 622 | -621 | -620 | -616 | ~ -612 | 
xk ‘835 | -843 | 852 | -863 | -875 | 891 | 906 | 
Tees | 4 943 ‘948 | -955 | -961 | -968 | -976 | -983 | 
266, aciegintiine sik | | | 
sul : ae P si 
i From the examples considered it is clear that the effect of A, is greater on the probabilit 
p 3188 Pp y 
of Type I error than that of A,. Positive A, reduces this probability and consequently makes 
the power also less than the ‘normal theory’ power locally. But there is increase in the power 
in the region of high power. The effect due to low values of A, on the power is quite small, 
pwer but for lepto-kurtic populations with high values of A, (say, A,> 1), there is a noticeable 
i ea | increase in the power up to a certain point and then a subsequent decrease in the region of 
nent very high power. The effect of negative A, can very well be seen to be just the reverse. 
Comparison of the values of power function, as calculated for a sample of size 10 from a 
of As population with A, = 0-6 and A, = 0-4 in the two cases (i) when the critical region is erron- 
tual eously based on the upper (or lower) 5 % point of the normal ¢-distribution and (ii) when it 
rmal is based on roughly the upper (or lower) 5 % point of the ¢-distribution corresponding to the 


non-normal population considered (as obtained by the method of § 4) is shown below in 

from =| Ss‘ Table 3. 
It may be seen that in the case of critical region defined by f) = 1-833, the power decreases 
in the region of less power as a consequence of « being reduced from 0-050 to 0-035, while in 
the case of correct critical region the power is consistently greater than the ‘normal theory’ 
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Table 3.+ Showing comparison of the power when the critical regions 
are erroneously and correctly obtained for n = 10 























| AQ=A=9 | A, = 0-6, Ay=0°-4 | A,= 0-6, A= 0-4 
| 
Pn metals 5 Pn | 
to= 1-833 | t= 1-833 tp = 1-627 | 4= —1-833 | t= —2-076 
ao = = 

| 0 0-050 0-035 0-051 0 | 0-070 | 0-051 
Pog 236 | 198 259 =) |..7 | 
| 2 580 579 -663 =s | 585 | 530 
3 868 910 -940 ag s38_—séid| 784 

4 979 | 997 -9996 wt 957 | -933 








power. Also a similar conclusion about the increase of power in the region of less power 
when p,, is negative and f, = — 1-833, can be easily drawn. Thus the behaviour of the power 
function in the immediate neighbourhood of the null-hypothesis is found to be much in- 
fluenced by the choice of erroneous critical region on the assumption of normality of the 
parent population. 


6. CONCLUSION 


Obtaining the expression for the power function of Student’s t-test for samples drawn from 
non-normal populations represented by the first four terms of the Edgeworth series, we have 
considered some numerical examples and derived conclusions about the nature of effects 
of parental skewness and kurtosis on the power of the t-test. Critical regions based on the 
assumption of normality of the parent population have been considered, which though 
erroneous have helped in bringing out how the power curve is distorted when the usual 
t-test is applied to a sample from a population of known skewness and kurtosis. 

The present work, it may be noted, has aimed at providing answers only to such questions 
as ‘how much non-normality can be allowed in a near-normal population without seriously 
affecting the significance level or the power of Student’s t-test’ and ‘what types of effect 
there will be in different non-normal situations’. For the derived formulae are applicable 
only when the values of A, and A, of the sampled populations are known a priori, but the 
situations seem to be rare in practice when A, and A, (and not A,) are known. 

The study on the whole shows that for practical purposes, the power of the t-test is not 
seriously invalidated even if the samples are from considerably non-normal populations. 
Considering the whole range of the values of A, and A, within Barton & Dennis (1952) 
limits, the magnitudes of the effects of A, and Ay are broadly of the same order. This is so 
because these limits permit values of A, below 0-4 or 0-5 only, while they permit the values 
of A, up to 2-4. The effect of skewness is more prominent when the kurtosis in the parent 
population is of low order, but when the population is highly lepto- or platy-kurtic, both the 
effects of A, and A, become equally prominent generally. In such a situation, sometimes 
they may be nullifying each other also, for example, in the case of a leptokurtic, positively 


t When p, is negative, the hypothesis H, (u~> 4) is tested against the alternatives of the form p< fy 
and —t,, the lower 100« % point, gives the critical region. In such a case it can be easily seen that the 
formula for P,; remains the same as that given by (3) of §2 excepting for a change in the sign of P),. 
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skewed population, the power of the t-test is likely to be quite close to the ‘normal theory’ 
power in the region of low power. 

Also as is expected our results show that with increase in sample size, the effect of non- 
normality on the power of the t-test diminishes. 


I wish to acknowledge gratefully the help and guidance I have received from Dr A. K. 
Gayen in the course of my investigations. I am also indebted to Prof. E. 8. Pearson for 
suggesting a number of improvements to the presentation of the paper. 


REFERENCES 


Arrey, J. R. (1931). British Association Mathematical Tables, 1, Table XV. 
Barrett, M. 8. (1935). Proc. Camb. Phil. Soc. 31, 223. 

Barton, D. E. & Dennis, K. E. (1952). Biometrika, 39, 425. 
Cornisu, E. A. & FisHer, R. A. (1937). Rev. Inst. Int. Statist. 5, 307. 
GayvEn, A. K. (1949). Biometrika, 36, 353. 

Geary, R. C. (1936). J. R. Statist. Soc. Suppl. 3, 178. 

Geary, R. C. (1947). Biometrika, 34, 209. 

GHuURYE, S. G. (1949). Biometrika, 36, 426. 

Jounson, N. L. & WeEtcn, B. L. (1939). Biometrika, 31, 362. 
Neryman, J. (1935). J. R. Statist. Soc. Suppl. 2, 107. 

NeymMan, J. & Tokarska, B. (1936). J. Amer. Statist. Ass. 31, 318. 
Pearson, E. 8. & ADYANTHAYA, N. K. (1929). Biometrika, 21, 259. 


NOTE ON MR SRIVASTAVA’S PAPER ON THE 
POWER FUNCTION OF STUDENT’S TEST 


By E. 8. PEARSON 
University College London 


It is perhaps of historical interest to compare Mr A. B. L. Srivastava’s theory with the results of the 
sampling experiments which Mr N. K. Adyanthaya and I carried out nearly 30 years ago (Biometrika 
(1929), 21, 276-85). Tables V, VII and VIII of that paper contained the earliest comparison of power 
functions of statistical tests, although our frequencies were tabled in terms of the ‘second kind of error’, 
P,,. The concept of ‘power’, (1 —P;;), had not been formulated in mathematical terms at that date. 

We were then dealing with a two-tailed test the critical region of which contained both tails of the 
null distribution of t. However, the power function of the one-tailed test with a = 0-05 and the two- 
tailed test with a = 0-10 are very nearly the same for p,>1, so that the following comparisons are 
legitimate. Table V of our paper contains empirical values of 100P,; based on drawing 100 random 
samples of size n = 10 from populations having Pearson-type frequency distributions with the following 
moment ratios: 





| Population A | B C D | E 
ioeeaane © pleads, Slemenbeid bi vette : ede A, 
| 
| As=VA, 0 | 0 | 0 0-447 (= ./0-20) 0-707 (= ./0-50) | 
A, = f,-3 —0-5 1-12 4:07 0-30 0-73 | 
| | 





A is a Type II distribution, B and C are Type VII distributions, while D and E are Type III or y?- 
distributions. 

The theoretical values of 100P;, in the following table have been obtained from Srivastava’s equation 
(3), using the values of the P-functions given in his Table 1. The observed frequencies are taken directly 
from Table V of Pearson & Adyanthaya’s paper. For the symmetrical populations A and B the 
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theoretical frequencies differ only slightly from those of normal theory, and apart from a high frequency 
for B when p = 3 the sampling results are consistent. For the skew populations D and E where there is 
a considerable difference between the situation for positive and negative p, the empirical results are also 
consistent with Srivastava’s theory. For A, = 0, A, = £,—3 = 4-07, the Edgeworth curve has negative 
ordinates in the tails as well as subsidiary maxima, so that it cannot be regarded as representing the 
Pearson Type VII curve with the same moment ratios. Nevertheless, the trend of the empirical fre- 
quencies is in the same direction as that obtained from Srivastava’s equations. 

It was, of course, never expected that these sets of 100 samples could do more than suggest the manner 
in which the ‘normal theory’ power curve for the single sample t-test was sensitive to departure from 
normality. It is, however, satisfactory to find that the latest piece of theoretical work tends to confirm 
and round off this early empirical investigation. 





€ 
Symmetrical populations ‘ 
F 
| A B | C ; 
Normal | 
| , theory ie i ere iz l a 
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ie j 
, a f wi "youn . 
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A QUICK ESTIMATE OF THE REGRESSION COEFFICIENT 


By D. E. BARTON anp D. J. CASLEY 
University College London 


1. In this note we investigate some properties of a ‘quick’ estimator, b’, of the regression 
coefficient £ of a variable y on a variable x. This estimate has the advantage over the least 
squares estimate that (a) it is applicable to certain types of censored data, and (0) it provides 
a consistent estimator (under certain restrictions) of the slope parameter when y and a are 
structurally, rather than regressively, related (in the sense of Kendall (1951-2) and Neyman 
& Scott (1951). It has been chiefly considered in relation to problems of this latter kind 
(Wald, 1940; Banerjee & Nair, 1942; Bartlett, 1949; Hidimoto, 1956), although it originated 
in regard to the situation where the 2’s were arbitrary constants (Bose, 1938; Nair & 
Shrivastava, 1942). We shall consider its behaviour when x and y are samples from a hi- 
variate population. We show that it has an efficiency of 75—80° when the population is 
bivariate normal by means of a conditional expectation technique which enables the results 
of David & Johnson (1954) to be applied.* We further examine its distribution from the 
point of view of its providing a quick test of the hypothesis # = 0. 

1-1. We have a set of n independent observations of (x, y) where 


Ely |x) = a+ fa, (1) 
a, 2 being unknown constants. We order the 2’s so that x, < 7... <2, and write 
1é ae 
Ly = E Vi» Xe = bate 
If y; is the variable paired with x;, i = 1,...,n, we put 


1é& ad eo 
I= 7 UY Pe= 7D Yn-i+n 
kit kia 





and compute as our estimator b’ = Ye m9 (2) 
Uy, — Uy, 

Since &(b' | x, ...,%,) = B, (3) 

then &(b') = B (4) 


so that b’ is an unbiased estimator of 7. 
Comparison is made with the least squares regression coefficient b, and it will be noted at 
once that equations (3) and (4) hold equally for this statistic. 


2. We suppose that the n bivariate observations (x,, y;) are independent, but not neces- 
sarily from the same population, and put 
o*(a;) = var (y; | %;), 


* Tt will be noted that Brown and Mosteller (Mosteller, 1946) have considered b’ as a quick estimator 
of the correlation coefficient in a bivariate normal population in the case where the variances of x and y 
are both known. 
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this notation indicating that o?(x;) may be a function of x; or &(x;). We then have 


1 , {o?(a;) + 07(%p_i41)} 


var (b’ | 2, ...,%,) = = ———_ - = as 5 
| 1 n ke (2), — 2, ( ) 
reducing when we have homoscedasticity, i.e. when o(x;)=0, to 
202 
var (b | a4, a + = k(@,—%,)?" (6) 


We note that b’, depending as it does on the ordered x’s and not on the order of the y’s, is 
a simple weighted mean of the y’s and hence the regression coefficient 5 (weighted if o(;) is 
not constant), is more efficient than 5’ for a given set of {x;}. 

Hence in general vee ¥ > weed, (7) 


where, if we have homoscedasticity and if 





k 
of = lim 56 | > a), (8) 
n-> co i=1 
then lim n var b = c/o. (9) 
n> oo 


We may thus use the variance of b as a standard of comparison, even if 6 is not itself 
realizable, owing, for example, to incomplete knowledge of {w,} in the censored case. 

2-1. We now restrict ourselves to the case where the (x;,y;) are taken from the same 
bivariate population with marginal probability density function of x denoted by f(x). We 
consider first the limit as n > oo and k/n > p so that &(x,_;.,,) and &(x;) converge to the 
upper and lower p-tiles of the probability density function of x, say ;* and 2}. 


1 o2+e 
Then lim n var b’ = — ie wed 59 (10) 
n—> oo Pp (Hp Pom Hy) 
1 is SF iw 
where fy = =| af (x) dx, of? = Al o*(x) f(x) dx, ete. (11) 
PJa, P Jaz 
Brewn and Mosteller gave an analogous result in the normal case. 
In the homoscedastic case this gives the efficiency of b’ relative to b as 
i. 
e, = tp Hoe! 1, (12) 


and, in fact, it may be shown strictly that e, < 1. A surprising result, however, is that if p is 
chosen to maximize ¢,,, e, is not very different from unity. For instance, when « is rectangu- 
larly distributed e,, = 8/9, p = 4. Further, we shall see in the next section that if zisnormally 
distributed e,, = 0-8098, p = 0-2703, while e, > 0-75 for 0-167 < p< 0-397. Thus b’ will givea 
quite respectable ‘quick’/estimate for use in situations where speed of computation com- 
pensates for loss of information. 


3. When the population from which the variables (~;, y;) are drawn is the bivariate normal 
population with the parameters (/,, 1,7 ,, 7,,p), we have 


C. oz 
O = fly — PM B =p. (13) 
zx Zz 











OV 


Fi 


(10) 


(11) 
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It is plain in this case that = Oe (14) 


is distributed, in a manner independently of the five parameters, as the quotient of a unit 
normal variable and an independent variable, d, say, which is the difference between the 
means of the k largest and the k least of a sample of n independent unit normal variables. 

The conditional distribution of v for a given d is normal and therefore symmetrical: the 
over-all distribution of v is therefore also symmetrical. 


100 ;- 


a 


50 ! | | | IL | | | | J 
0:05 010 015 0:20 0-25 0:30 0:35 0-40 0-45 0:50 





40 + 





Fig. 1. Graph of 100e,, the large-sample percentage efficiency of b’, expressed as a function of p, 
where 2p is the proportion of the observations whose actual values are used in computing b’. 


We have varv = &(d-*), (15) 


36 (d-*) 


i {&(d-)}" (16) 


and its moment ratios are A,=9, £ 
The problems which will concern us are: (1) the value of p for which the ‘large sample’ 
efficiency of b’ is greatest; (2) the value of p for which the relative efficiency of b’ to 6 is 
greatest in finite samples; (3) approximations to the distributions of v in these cases. 


3-1. If, following as far as possible the notation of the Introduction to Pearson & Hartley 
(1954), we write X,, as the root of p = Q(X) and put Z, = Z(X,), ¢, = Z,/pt we have 
lim &(d) = 2¢,. 


Hence (12) gives 
ep = 2pgp, (17) 


which is a maximum when 2X, = ¢,, i.e., if p» is the root of this equation, 9 = 0°2703, 
€», = 0°8098. A graph of e, is drawn in Fig. 1. 


+ The function ¢, is tabulated in K. Pearson (1931). 
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3-2. In general it has not been found feasible in the small sample case tu evaluate the 
relative efficiency of b’ to b explicitly. Denoting this by E(p,n) and using the well-known 
result that varb = o?(1—p?)/{o2(n —3)}, we see that 


k 


k 4 —1 
ily eee ees peapensates ?(q—-2)\-1 ¢ 
Tee ST LY =) ey. (39) 
and we have had to expand this as a series 
E(p,n) = e,+f,n*+O(n-), (20) 
by a slight extension of the methods of David & Johnson (1954). Thus 
a, ae ‘1)) 
sar) = lr+!(2%—2).0(4)}, ‘ 
= BBB Fn Nags 6) * Nn " 
where V=limnvard, B= lim né(d—2¢,). (22) 
oe 1 {, 1/5V 2B ‘1)\) 
; ee ene hie sae, 2 
Similarly (d-*) 16g8\' +; tad i.) +0 (73)) (28) 


Let us consider a sample of n independent unit normal variables and order them so that 
a < ... <%,. Then if %;, is defined as before, &(%;, | x,_;,) = Rj, where Ry, = Z(x,_,)/Q(%n_x) 


oe n&(d—2¢,) = 2n&(%,—gy) = 2n&(Ri—4$,) 
Similarly, vard = &{var (d| x,_;.,%p44)} + var {E(d | x__p, Xp 41)} 
2 
aa {l+a,_,R,—R 2} + var (R;,— R,) 
where Ry = Lleey-1)/P (eps). 


It is then a simple exercise in David and Johnson’s technique to obtain (using their table 2), 
oe ee eS, 2(1— Xp bp + Xp) 

POy Pp 

when k=(n+1)p+O(1) as n>oo. 


(24) 


Putting the values (24) in (21) and the result into (19) we obtain for the second coefficient 
in (20): 
( ) ie = 4p¢,(¢, + X,)—2(1—p) —3(1—X, ¢,, + X3). (25) 
The maximal p may be written 


P1 1 
Pmax = Pot n +O (;3) ’ 


where p, maximizes ¢, and p, = —f),,/e),, (the primes here denoting differentiation with 
respect to p). 
We have at the maximum 


Pmax = 9°2703 — 0-0712n-1 + O(n-2). 
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The efficiency at the optimum (or equally at pp) is 


Bax =a E(Pmax,”); 
i.e., Eynax = 9°8098{1 — 1-0201 n-! + O(n-*)}, 


which suggests that 81n/(~+ 1) will adequately describe the percentage efficiency in small 
samples. Whilst this is only a ‘first-order correction’ to the ‘large sample result’, taking it 
together with Fig. 1 we may reascnably conclude that if we make k/(n + 1) lie between one- 
sixth and one-third we shall iavz an efficiency of between 70 and 80%, for samples of ten or 
more with an optimum j:rovortion about one-quarter for smaller samples and one-third for 
larger ones. 


3-3. If we know that o,, = 0, we may wish to use b’ to provide a quick test of the hypo- 
thesis p = 0, or to give confidence intervals for p and to do this it is necessary to know, at 
least approximately, the distribution of the v of equation (14). Plainly v tends to be nor- 
mally distributed for large n. For finite n we may expand /, as a power series in n~ by 
putting the results of (24), (16) in (23). Thus 


V 1 
= é — a 2 2 
Bo (1+, 5 +(-3})> (26) 
which gives at the value py of p, 


Py = 3{1 43-0893 n-! + O(n-*)} (27) 


This result suggests that we may expect an appreciable degree of leptokurtosis for sample 
sizes in the range commonly met with and that use of (27) in conjunction with the Pearson— 
Merrington table (Table 42 of Pearson & Hartley (1954)) will give an indication of the order 
of the error incurred by using the normal approximation. 
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ON THE CHOICE OF THE BEST AMONGST THREE NORMAL 
POPULATIONS WITH KNOWN VARIANCES* 


By A. ZINGER anp J. ST-PIERRE 
University of Montreal, Canada 


1. Summary. Statistical decisions to be taken in the case of three normal populations with 
known variances are investigated in the following two situations: 

(i) The problem of selecting the population with the largest mean (the case of the smallest 
is obviously similar), given a probability of taking a wrong decision. 

(ii) The problem of determining the smallest sample sizes required to detect the popula- 
tion with largest mean, in the case of some given values for the non-centrality parameters. 
These sample sizes depend on the probabilities of good and wrong decisions. 

The statistic used in connexion with these problems is the standardized difference between 
the two largest sample means. 


2. Introduction. In many instances, the various criteria used for testing equality of 
means do not give pertinent answers to the questions that the experimenter has in mind; 
in fact he knows that the population means are not equal. He may be interested, for example, 
in detecting the ‘best’ population (the population with largest mean). 

Several authors give tests intended to be sensitive to a single outlying mean. Irwin 
(19256) proposed the standardized difference between the two largest observations as a test 
criterion, and he earlier (1925a) gave exact and approximate results for the null distribution 
of his statistic. An exact formulation of the null distribution is also given by St-Pierre & 
Zinger (1956). McKay (1935) proposed the standardized difference between the largest 
observation and the sample mean as a test criterion, and he obtained its null distribution, 
which has been further considered, together with that of the studentized} difference, by 
Pearson & Chandra Sekar (1936) and Nair (1948). None of these authors studied the power 
of their test criteria. 

The present paper takes Irwin’s statistic for three sample means, and uses it in a pro- 
cedure for detecting the largest mean, as first proposed by Bose & St-Pierre (1954). The 
non-null distribution is also derived. The design-of-experiment aspect of this problem was 
considered by Bechhofer (1954), and his solution is improved. 


3. Non-null distribution of the standardized difference between the two largest sa.nple means. 
Let us consider three normal populations with unknown means 4; and known. variances 
o7,i = 0, 1, 2. From the ith population, a sample of n; values %,;,t = 0,1,2;7 = 1,..., mj, i8 
drawn. Let %; be the sample mean associated with the population having mean ,;. Let 
2g) > XZ) > XZ be the ordered sample means; the event %; = %;, i+, will of course be neglected. 


Consider /4(9) > q) > 4, the ordered unknown means. It is not known which population is 


* Work done under the sponsorship of the National Research Council of Canada, being part of a 
thesis by A. Zinger, written under the direction of J. St-Pierre and submitted to the University of 
Montreal in partial fulfilment of the requirements for the Ph.D. degree. 

+ For a ‘studentized’ difference, an estimate of variance is introduced in place of a known popu- 
lation value. 
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associated with ~,. Only the following six mutually exclusive events are possible: 
(Xo); X),%») comes from the populations with means (M4), 4“), Mw), t#J#k = 0,1,2. It 
follows that the joint density of %), %), XZ is given by 
| (MoM Ng)* 5’ ex | we. hee — hwo)? 4 2% — Ky»)? 4 Fo— cal 

(277)? oy040% 2 
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f(Fo a, Xp) = a a at 
where X’ means summation over all permutations of 7, j, k. The test which we shall consider 
assumes that the values of a5, 7, and o, are known and that the sample sizes are chosen so 
that o?/n; = 0? (¢ = 0,1, 2). We then derive the distribution of 
U = (X—X)/o, 

and write Y= (H@-H@)/, 8 = (H—Ma)/2- 

Changing the variables to w, (%) + % — 2%)/(7/6) and (% + Z%q +%q)/(o./3), and inte- 
grating out the ik *, one gets 


d(u,y, 6) = g(u, Y> 0) + w(u, y;9), (1) 
1 2 ro) 
where g(u, y, 8) =5—- Jenn A(t) dt tetor of A(t) a , 
2./7 (u—y—28)/V6 (u—y+8)/V6 
w(u,y, 6) = ed [ert | A(t) dt + e-tusrsar A(t) dt 
2/7 (u—y—28)/V6 (u—y+8)/v6 
+ eto A(t) dt + e-toar( (t at : 
1 (u+2y+8)/V6 (u+2y+6)/V6 
where ¢ = ———e-#". Note that g(u,y,6) comes from the terms in /(%@, %q),%), where 


2,/7 


i = 0, and w(u, y, 6) from the remainder. 

4. Integration of components of the density of u. Let us define 

W(k,y, 6) = [ow y,6)du and G(k,y,d) = [-ate.y, 0) du. 

These functions have been evaluated as tollows 

(i) W(k, 0,6): by numerical integration of w(u, 0, 6). 

(ii) W(k, y, 46): using the relation 

W(k,y,6) = W(k+y,0,y+6)+ W(k+y+6,0,y)— W(k+ 2y +4, 0, 0). 
(iii) G(k, y, 6): using 
G(k,y, 8) = W(k—y,0,6)—4W(k—y+6,0,0), if k>y 

and G(k,y,8) = N((y—k)/,/2)+ N((y —k + 4)/./2) + W(y —k +4, 0, 8) —4W(y —k + 26, 0, 0), 


ifk<+y, where N(x) = [0 dt. The functions W(k, y, 6) and G(k, y, 6) have been tabulated 
0 
for 57 pairs of parameter values.* 


5. Detection of the best population. In order to detect the best population the following 
procedure is proposed: 

(i) Draw n; observations from the ith population, subject to the restriction n;/o? = 1/0°, 
i= 0,1,2. 

(ii) Let 2) > %q) > % be the ordered sample means. Compute u = (%@—%p)/o. 

(iii) If w>k, decide that %) comes from the population with mean j1@, i.e. from the best 
population; if wu <k, do not take the above decision. 


* Tables can be obtained from the authors upon request. 
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Since the statistic used was first introduced by Irwin (19256), this procedure will hence- 
forth be referred to as the J-procedure. 

The critical value k is to be chosen in such a way that the probability of a wrong choice is 
at most equal to a number a (0< a < 1) given in advance. « will be called the level. A wrong 
decision is taken when u>k and %@ comes from the population with mean fy or Wy. It is 
readily seen, (1), that 


Pr{u>k and Z%@ comesfrom fq or My} = W(k,y,d)<«. 


Let us now define the least favourable configuration. The configuration (7p, 59) is the least 
favourable for a given value of k if, for all y and 6, 


Wk, y, 8) < W(k, Yo, 59). 


In general, the least favourable configuration is independent of the critical value. In the 
present case the least favourable configuration is not unique. It is a function of the critical 
value k (or of the level ~). Some calculations, based on the tables, show that: 

(i) for k < 0-856 (~ > 0-2726) the least favourable configuration is y, = 0, dy = 0; 

(ii) for k > 0-856 (a < 0-2726) the least favourable configuration is yg = 0, dy = 00. 

These configurations will be referred to as the null and the pseudo-null configurations. 

The critical values k, given in Table 1, are obtained by solving the equations: 


W(k,0,0) =a, if a>0-2726, 
and W(k,0,co) =a, if a<0-2726. 


The following example shows how to apply the J-procedure. 

Consider three normal populations with variances 20, 5 and 10 respectively. Let us assume 
that the experimenter wants a level of 0-01; hence, the critical value is 3-29 (Table 1). 
Suppose that the sample sizes are, respectively, 80, 20 and 40, so that o = 0-5. Let 15-1, 
14-0 and 16-9 be the sample means. It now follows that u = 3-6; consequently, the experi- 
menter chooses the third population as the best. 

The experimenter may be interested in the power (probability of a good decision) of the 
test. 

The probability of a good choice is given by G(k, y, 6), since a good decision is made when 
u>k and Z») comes from the population with mean /“). The following relations are found to 
be helpful in the calculation of the power of the test 


G(k, Y é) _ G(k-y, 0, 6), if k>y, 
and G(k,y,6) = G(C—k+y,C,6), if k<y, 
where C is conveniently chosen in order to use the tables. 
Whenever additional information is available about the non-centrality parameters, the 
proposed procedure may be improved by selecting the appropriate least favourable 
configuration. 


When it is known that d< A and « < 0-2726 
(a) the least favourable configuration is (0,0) and the critical value is k,, if 


“= W(k,, 0,0)> W(ky, 0, A); 
(b) the least favourable configuration is (0, A) and the critical value is k,, if 


a = W(k,0,A) > W(ks, 0, 0). 
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Table 1. Critical values for the I-procedure 




















| 
k a z a k a k fo 
‘Sexe Bon < | 
| 0-00 00-6667 1-190 0-2000 2-30 0-0519 3°40 0-0081 
0-05 0:6387 1-20 0-1981 2-326 0-0500 3°45 0-0074 
0-10 0-6112 1-25 0-1884 2°35 0-0483 3°50 0-0067 
| 0-15 0-5843 1-30 0-1790 2-40 0-0448 3°55 0-0060 | 
0-20 0-5579 1°35 0-1699 2-45 0-0416 3°60 0-0055 | 
0-25 0-5320 1-40 0-1611 2-50 0-0385 3-643 0-0050 | 
0-30 0-5068 1-45 0-1526 2-55 0-0357 3°65 0-0049 | 
0-35 0-4823 1-466 0-1500 2-60 0-0330 3°70 0-:0044 | 
0-40 0-4583 1-50 0-1444 2-65 0-0305 3°75 0-0040 
| 0-45 0-4351 1-55 0-1365 2-70 0-0281 3°80 00-0036 
| 0-50 0-4125 1-60 0-1289 2-75 0-0259 3°85 0-0032 
| 0-55 0-3907 1-65 0-1217 2-80 0-0239 3-90 0-0029 
0-60 0-3695 1:70 0-1147 2-85 0-0219 3-95 0-0026 
| 0-65 0-3491 1:75 0-1080 2-90 0-0202 4:00 0-0023 
| 0-70 0-3294 1-80 0-1015 2-905 0-0200 4:05 0-0021 
0-75 0-3104 1-812 0-1000 2-95 0-0185 4-10 0-0019 
| 0-80 0-2922 1-85 00-0954 3-00 00-0169 4-15 0-0017 
| 0-85 0-2746 1-90 0-0895 3-05 0-0155 4-20 0-0015 
| 0-856 0-2726 1-95 0-0840 3°10 0-0142 4-25 0-0013 
0-90 0-2623 2-00 0-0786 3°15 0-0130 4:30 0-0012 
0-95 0-2509 2-05 0-0736 3°20 0-0118 4°35 0-0010 
1-00 0-2398 2-10 0-0688 3°25 0-0108 4:370 0-0010 
1-05 0-2289 2-15 00-0642 3-290 0-0100 4-40 0-0009 
1-10 0-2183 2-20 0:0599 3-30 0-0098 
1-15 0-2081 2°25 0-0558 3°35 0-0089 
| 


For example, let us assume that « = 0-05 and é< 2. The solution of 0-05 = W(k,, 0,0) and 
0-05 = W(k,, 0, 2) are, respectively, k, = 1-96 and k, = 2-20. It then follows that the least 
favourable configuration is (0, 2) and the critical value is 2-20. In the absence of information 





on 6 the critical value is 2-326. 


A particular case which is of great interest is that of testing for outliers. In that case 
A = Oand the least favourable configuration is (0,0). The most common valuesof « andk are: 


a k 
0-10 1-556 
0-05 1-957 
0-01 2-738 
0-005 3-03 





6. Numerical investigation. In order to appreciate more clearly the working of the test and 
to make some comparison with an alternative procedure which might be based on a test 
given by McKay (1935), we have carried out the investigation, the results of which are 


28-2 


Table 2. Comparison between the two procedures 











Parameter 
values 
7 | 8 
(1) | (2) 
02 | 0-2 
| 
0-2 | 0-4 
02 | 2-0 
0-4 | 0-2 
0-4 | O-4 
04 | 1:0 
04 | 2-0 
1-0 0-0 
| 
1-0 1-0 
| 
1-0 | 2-0 
20 | 0-0 
| 
20 | 2-0 





0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 


0-240 
0-079 
0-017 
0-240 
0-079 
0-017 


Probability of decision 








I-procedure 


M-procedure 





Theoretical 


0-164 
0-039 
0-005 


0-181 
0-045 
0-007 


0-270 
0-089 
0-018 


0-205 
0-054 
0-008 


0-225 
0-062 
0-010 


0-275 
0-086 
0-016 


0-321 
0-115 
0-026 


0-333 
0-113 
0-023 


0-443 
0-183 
0-047 


0-488 
0-224 
0-067 


| (o 756) 


0-488 


0-185 
0-035 
0-004 


0:177 
0-035 
0-004 


(0-189) 
(0-052) 
(0-009) 


0-157 
0-028 
0-003 


0-149 
0-027 
0-003 


0-144 
0-031 
0-002 


0-155 
0-040 
0-006 


0-094 
0-014 
0-001 


0-074 
0-013 
0-001 


(0-072) 
(0-015) 
(0-002) 


0-025 
0-003 
0-000 


(0-017) 
(0-002) 
(0-000) 


Empirical Empirical 
G Ww G | Ww 
aoe dpccuanemininah ee as Se 
| 
(6) (7) (8) | (9) 
0-156 0-186 0-102 | 0-126 
0-032 0-036 0-041 | 0-040 
0-009 0-001 0-012 | 0-002 
0-178 0-173 0-124 | 0-125 
0-040 0-036 0-047 0-040 
0-009 0-002 0-015 0-003 
0-270 | 0-192 0-304 | 0-250 
0-080 0-052 0-145 | 0-113 
0-017 0-007 0-050 0-044 
0-198 0-151 0-143 0-106 
0-051 0-027 0-054 0-033 
0-013 0-001 0-019 0-002 
0-216 0-143 0-162 0-102 
0-058 0-026 0-062 0-033 
0-015 0-002 0-021 0-002 
0-262 0-147 0-244 0-124 
0-081 0-029 0-100 0-050 
0-017 0-003 0-037 0-006 
0-314 0-160 0-372 0-215 
0-106 0-040 0-201 0-100 
0-026 0-005 0-075 0-030 
0-318 0-092 0-277 | 0-069 
0-102 0-012 0-112 | 0-012 
0-024 0-000 0-039 | 0-002 
0-440 0-072 0-424 | 0-079 
0-172 0-013 0-229 0-028 
0-044 0-001 0-084 0-002 
0-490 | 0-074 | 0-574 | 0-135 
0-220 0-017 0-369 0-062 
0-060 0-001 0-163 0-015 
0-630 0-020 0-596 0-018 
0-319 0-002 0-371 | 0-002 
0-102 0-000 0-164 | 0-001 
0-758 0-017 0-839 | 0-048 
0-492 | 0-001 0-698 | 0-020 
0-219 0-001 0-465 0-002 

















| 








Theoretical probabilities in parentheses were obtained by interpolation. 


G stands for good decision, W for wrong decision. 
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summarized in Table 2. In the first place we have calculated the exact values of the prob- 
abilities of good and wrong decisions, following the I-procedure, i.e. G(k, y,6) and W(k, y, 6) 
(columns 4 and 5) for the various combinations of y and é shown in columns 1 and 2. 

Three critical values k have been taken namely 1-0, 2-0 and 3-0, corresponding as seen from 
Table 1 to levels for « of 0-2398, 0-0786 and 0-0169, respectively. The following points may 
be noted: 

(a) For given « and y, G increases with 6, but assuming that in practice the statistician 
would not be prepared to take « as large as 0-24, y must be fairly large (1-0 or 2-0) before 
there is an appreciable chance of reaching a good decision. 

(6) For the cases considered, values of W approach nearest to those of a when y = 0-2, 
6 = 2-0, i.e. in the situation approaching nearest to the most unfavourable one. 

(c) Generally, W is far less than «, confirming the statement in the preceding section 
that if information is available about y or 6, the test could be made more sensitive by 
reducing k. 

We also carried out the following empirical investigation for the twelve pairs of (y, 6) 
values listed in Table 2. For this purpose 1000 initial triplets (x, y, z) of normal values, with 
mean 2 and variance unity, have been chosen from Dixon & Massey (1951). The samples 
were unexceptional since the respective means for x, y and z are 2-027, 2-025 and 2-025. Each 
initial triplet was used to define twelve triplets, corresponding to the following parameter 
values: 

y:0-2 0-2 02 04 2: 04 04 1121 2 2 

5:02 04 20 02 04 10 20 0 1 2 0 2 
For example, the first initial triplet (2-422, 0-694, 1-875) generates the triplet (2-822, 0-894, 
1-875), which is a sample from populations with means in accordance with the non-centrality 
parameters y = 0-2 and 6 = 0-2. In a similar way eleven other triplets are generated. 

The resulting numbers of good and wrong decisions, expressed as proportions of 1000, are 
given in columns 6 and 7 of the table. They are in satisfactory agreement with the theoretical 
values. It is now possible to use these empirical data to make some comparisons with a pro- 
cedure based on McKay’s test. 


7. Procedure based on difference between largest sample mean and grand mean. McKay (1935) 
considers a test for outliers based on a statistic which we may call v = (x@ —%)/o, where x@) 
is the largest (or smallest) observation in a sample of size n from a normal population and 
% is the sample mean. He shows that approximately for small values of «’, 


ti ‘a ae ai —$? 

ees rn ae, i 
Applied to the present problem, n = 3 and v = (2%) —Z%q)—2%)/(37). MeKay’s null hypo- 
thesis H, is that y = 6 = 0, ie. that wm = My = M@; although he did not consider the 
alternatives to H, nor the power of the test, it seems likely that it would be most efficient in 
cases where y > 0, 6 = 0. Using McKay’s test in the sense intended, if H, is rejected when 
v>h, a ‘wrong’ decision is taken when in fact y = 6 = 0. On the other hand, for the J-test, 
a ‘wrong’ decision is taken when we conclude that %) comes from the population with 
largest mean when in fact it comes from one of the populations with mean fq Or /4), Which 
are </@. Thus the levels « and «’ have not the same interpretation. 
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If in spite of this fact, we choose k and h so that « = «’, we find 





| Modified McKay 


a preeene or M-procedure 


(k) 


Level, a=a’ 





| | 
| (h) | 
+ | . | 
1-000 1-146 | 0-2398 
2-000 1-584 0-07865 | 
3-000 2-068 0-01695 





| 

Calculating v = (2% —% —%.)/(30) for the same twelve sets of 1000 samples as we have 
used for u, we obtain the proportions of good and wrong decisions shown in columns 8 and 9 
of Table 2. Since the twelve generated triplets are not independent, neither the values of u 
nor those of v form a random sample; however, this does not invalidate a comparison of the 
two procedures since they are applied to the same data. 

It will be seen that in certain cases, e.g. when y = 0:2,6 = 2-Oandy = 0:4,6 = 2:0, W >a’ 
for the M-procedure. There is no reason why this should not be so since «’ is the risk of wrong 
decision when y = 6 = 0, not the risk in a more unfavourable case with 6>0. Although we 
have not explored the matter theoretically, it appears likely that when y is small and 
6 large, v may be large because 2%.) is much less than %» and Zq), not because Z is excep- 
tionally large. In fact, McKay’s test should in this situation be first used to establish that 
d>0, assuming y = 0 and, afterwards (having discarded Z.)), to compare % and X,). 

Some further investigation was carried out in the cases y= 1, d6=1; y=1,d=2; 
y = 2, 6 = 2 for which Table 2 shows that the empirical values W,, > W; and Gy, >G,. The 
critical values for the J-procedure were modified from k, to k; in such a way that the new 
probability of a wrong decision W7, is equal to W;,. This was done by ordering all the observed 
u values (uw, > U2 >...) that could lead to a wrong decision, and taking 





ky = 4(Uyo00n + “r00017+1)« 
Good decisions with k; as critical value were enumerated to give G7. Results are shown in 
Table 3. 
Table 3. Comparison between the two procedures in the cases 
y=1é=l,y=1,é=2andy=2,0=2 

















—— | A 
y é | ki Wi=Wy Gi Gy 
—————————] 
| 0-972 0-079 0-447 0-424 | 
1-512 0-028 0-290 0-229 
| 2-555 0-002 0-087 0-084 | 
| 
| 0-528 0-135 0-621 | 0574 
1-149 0-062 | 0448 +'| 0-369 
| 2-047 0015 | 0210 | 0163 | 
| | | 
| 0831 oo1s | o8s2 | 08939 | 
0-805 0-020 | 0-800 0-698 


| 1-783 0-002 | 0547 | 0-465 











mm in 
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Although clearly the tests could not be brought into line in this way in practice, as 
a matter of general interest we are now able to compare the probabilities of the two pro- 
cedures producing ‘good’ decisions when adjusted empirically so that the probabilities of 
‘wrong’ decisions are the same. It will be seen that G, is now always greater than Gy, 
although there must clearly have been considerable lack of precision in determining the k; 
values, particularly when W;, was small. 

A special comparison between the two procedures was made in the case of the detection 
of one outlying population. In that case, both critical values were changed so that 
W, = W'y = max (W,, Wj). The new critical values are given by 


Ki; = 3(Uyoo0w’ + “rooow%+1)> ¢ = I, M. 
Results are shown in Table 4. This comparison may indicate that even in the detection of 


outliers Irwin’s statistic is better than McKay’s. However, any tentative conclusions drawn 
from such figures only serve to show that a more precise examination is desirable. 


Table 4. Comparison between the two procedures in the cases 
y = 1:0, d = 0 and y = 2:0, =0 











y é ki ky Wi= Wy G; ay 
Pe See ee ALA . 
| 0 | 1-982 15 0-012 0-106 | 0-106 
2-497 2-135 0-002 0-047 | 0-036 
| 
| 2-0 0 1-876 1-802 0-002 0-349 | 0-281 
| 2-480 2-066 0-001 0-203 0-164 





8. Planning of an experiment to detect the best population. In the present case, the planning 
of an experiment is conditioned by an assumption about the value of the difference “1 — /Wq. 
The problem is to find the smallest sample sizes no, 2; = % 03/02 and n. = 903/04 such that, 
by applying the J-procedure, the probability of taking a wrong decision is < «(0 <a <1) and 
the probability of taking a good decision is > 1—£ (a <f <1), if 4@—Mq > is true. The 
critical value k and the sample sizes n; are to be determined by the conditions: 

for alld and forally>M/o =T, Wi(k,y,é)<a and G(k,y,6)>1—f. 

Let us now define the least favourable configuration. The configuration (79, dg) is said to be 

the least favourable if for a fixed k, for all y>TV and for all 6, 

W(k, y, 8) < W(k, Yo; 50) 
and G(k, y, 6) > G(k, Yo, 5o)- 
Let us consider tie dotted curve (Fig. 1) which divides the area mto two regions A and B. 
Region B is not considered, since in B the probability of taking a good decision is always 
less than 1/3. Some calculations show that in region A the least favourable configuration 
is (M/o,0). It follows that the critical value k and the sample sizes n,, 1 = 0,1,2, are 
determined by the equations 

W(k, M/o, 0) = a, G(k, M/o,0) = 1-£, n,/o0? = 1/07. 


Fig. 1 can be used to solve the system. 
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For example, if a = 0-05, 1—£ = 0-80, o2 = 2, 0 = 4, 02 = 10, and if M =1, then 
Fig. 1 gives M/o = 2-15 and k = 0-49; consequently, the minimum sample sizes are respec- 
tively 10(9-245), 19 (18-49) and 47 (46-225). 

Itis to be noted that the proposed procedure does not imply the necessity of always taking 
a decision. However, if « = /, then k = 0 and a decision is always taken: this is Bechhofer’s 
solution (1954; p. 30, column k = 3, t = 1). 
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Fig. 1. The W-curves have negative slopes. The G-curves have positive slopes. 
Wk, y, 0)=a, G(k, y, 0)=1-f. 


9. Optimum solution. When a decision is required, it is possible to improve on Bechhofer’s 
solution. This improvement is obtained by a repeated application of the procedure proposed 
in §8. 

Let 4 — /q > M be the assumed information. Let « and 1 — # be, respectively, the bounds 
for the probabilities of taking a wrong and a good decision at each step of the procedure. The 
critical value k and the sample sizes n;, i = 0,'1, 2, are determined as before. If, in applying 
the J-procedure, no decision is reached, then another set of samples of sizes n;, i = 0, 1, 2, is 
drawn and the I-procedure is applied once again. This is repeated until a decision is reached. 
It is readily seen that, for this new procedure, the bounds for the probabilities of taking 
a wrong and a good decision are, respectively, «/(l+a—/) and (1—/)/(l+a—/). The 
expected value of the total sample size for the ith population, i = 0, 1, 2, is given by 
n;|(W+G), where W = W(k,y,6) and G = G(k,y,6). A true bound for the expected total 
sample size for the ith population, i = 0, 1, 2, is n;/(1—). 
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Let us now denote by F the bound for the probability of taking a wrong decision in this 
new procedure. The problem is to determine the values « and 1 — f (or the critical value k and 
the sample sizes n;) in such a way that: (i) «/(1+a—/) = F, and (ii) n,/(1—) or T?/(1—/) is 
minimized, where ! = M ,/n,/o;, as defined in § 8. Since W(k,T, 0) = wand G(k,T,0) = 1—£, 
it follows that « and 1-- # are functions of k and [. Lagrange’s method gives as solution the 
roots in k and T of the system 


a/(lt+a—,) =F, 
da A1—f) dx0(1-f) . da , o(1—f) _ 
x a a ¥ '* " * 2 ** (2) 


A solution of the above system is obtained as follows. The following functions of k and T, 
F = W(k,T,0)/{W(k, T, 0) +G(k, TP, 0)} 

and H = T?/G(k,T,9), 

are readily obtained from the tables. A family of curves H = H(F,T) is drawn on Fig. 2 for 

certain values of I’. It can be proved that the equations of the envelope of this family of 


curves are precisely the system (2). Thus, an optimum solution is given by the envelope 
drawn on Fig. 2. 





cb i! Bechhofer’s solution 





Envelope 3 sae 








oor CU NO WwW HOU YOON CO SO 


1 ! ! = ae 1 
0-01 0-05 0-10 F 0-15 0:20 





Fig. 2. Optimum solutions. 


Let us consider the following example. Suppose that F = 0-02; then [?/G = 7-4 and 
[ = 2-27, according to Fig. 2. It follows that G = 0-696. Using Fig. 1, it is found that 
k = 1-04and W = 0-0145. Assuming that M = 0-5, and that the variances are, respectively, 
1, 2 and 1-5, the sample sizes are 21 (20-61), 42 (41-22) and 31 (30-92). The bounds for the 
expected total sample sizes are 30, 60 and 45. The sample sizes with Bechhofer’s method are 
42, 85 and 64. 
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Bechhofer’s solution T'?(F’) (IF) is /N A in Bechhofer’s notation) is drawn on Fig. 2 for 
comparative purposes. It is readily seen that for F < 0-16 the proposed procedure improves 
on Bechhofer’s. 


Table 5. Comparison with Bechhofer’s procedure 











Gain 
2/0 2.7 
Ae a r(F)  (%) 
O10 | 45 i Cn. 
0-05 | 575 74 | 28 
0-01 8-6 13-08 34 
0-005 9-7 15-62 | 38 








a Si 4 
0-10 1-63 0-75 
0-05 1-92 | 089 | 
0-01 2-48 118 | 
0-005 268 1:30 | 








In conclusion the authors wish to thank the editor and the referee for their helpful 


suggestions. 
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APPROXIMATE FORMULAE FOR THE STATISTICAL 
DISTRIBUTIONS OF EXTREME VALUES 


By J. J. DRONKERS 
Ruychrocklaan 180, The Hague 


0-1. SUMMARY 
This paper deals with the distribution function of the order statistics 


Ln—mis(m = 1,2...), My m(2). 
For this distribution function of the mth largest value (if m is counted from above) approxi- 
mate formulae are derived. 

These formulae are generalizations of the corresponding approximate formulae for the 
distribution of the extremes proper. Successively the initial distributions f(x) are supposed 
to be of exponential type, Cauchy type or of limited type (finite range). 

We first deal with the basic conditions to be imposed on the initial distributions f(x). An 
expansion formula has been derived for the distribution of the excess, 1 — F(x), which plays 
an important part in the investigations of this paper. 

We then consider the general formula of the initial distribution of the mth values, 
My, m(«). Appropriate formulae have been derived to determine the mode and the maximum 
value of M,, ,,(%). The behaviour of the maximum value dy varying n and m has also been 
studied and approximate formulae for M, ,,(x) are successively deduced. Every succeeding 
formula has a more restricted range of application. Finally, limiting expressions for M,, ,,(x) 
are deduced for the three types of initial distributions, mentioned above. The well-known 
limiting functions of Fisher—-Tippett and Gumbel are deduced again. Formulae for the 
determination of the interval of application have been deduced. 


0-2. INTRODUCTION 


The present situation of the theory of extreme values has been described by Gumbel (1954). 
He also gives a short history of this theory and discusses some practical problems. From 
a mathematical point of view there are two ways to deal with the problem of extreme values. 

The first one, used by Fisher & Tippett (1928) starts from the functional equation: 

F(a) = F(a,x+b,), 
which has the following meaning: 

Assume we have N samples each of size n. From each sample the largest value is taken, 
and the maximum of the NV samples of size n is the maximum in a sample of Nn. Then Fisher 
& Tippett point out that the distribution of the largest value F in a sample of size Nn is the 
same as the distribution of the largest value in a sample of size n, except for a linear trans- 
formation. Then the limiting distributions of the extreme values are determined as three 
solutions of the functional equation. Each solution corresponds to a certain type of initial 
distribution, respectively of exponential type, Cauchy type and of certain limited range 
distributions. 

In a recent publication Jenkinson (1955) gives a general solution of the functional equa- 
tion, which includes all three of the Fisher—Tippett solutions. See also Gnedenko (1943). 
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The second method of determining these limiting distributions has been introduced by 
von Mises (1936); see also the paper of Wilks (1948). He supposes that in case x — oo the 
limits of certain functions A,(7) and A,(a), mentioned in § 0-4 of the present paper, exist and 
take certain values. The function A,(x) has been connected with initial distributions of 
exponential type and A,(x) with distributions of Cauchy type. 

The mathematical treatment of the theory of extreme values dealt with in this paper is 
closely related with the method of von Mises. However, we introduce a supposition which 
is different from those of von Mises and which enables us to derive approximate formulae for 
the distribution of the extreme values and for the deviations from the corresponding exact 
formulae. In comparison with the application of the limiting formulae these approximate 
formulae may be applied for smaller values of n and may therefore be called ‘transitional’ 
approximations. 

A more general treatment has been given by deducing the formulae for the so-called mth 
values. Gumbel (1935), who also deals with this general case, considers only the initial 
distribution of exponential type. If m = 1, we get the formulae for the extremes themselves, 

To illustrate the application of the different approximate formulae, we take the normal 
distribution. The well-known limiting distribution function of the extreme values, which 
has been derived for the exponential types, is not however related in a simple way to the 
normal distribution, unless we consider very large values of n. It appears that applications 
of the transitional approximate formulae derived in this paper are more useful for 
smaller n. 

We remark that in the case of the normal distribution, the convergence of the limiting 
function can be improved by a suitable choice of the parameters of the asymptotic distribu- 
tion (see de Finetti, 1932). This is an artifice for this special case. The approximate formulae 
derived in this paper have a more general range of application. 


I should like to express my thanks to the referee for his comments which gave the paper its 
final form. 
0-3. CONDITIONS 


Throughout this paper we consider populations with continuous cumulative distribution 
functions (cdf), which have at least three finite derivatives. 

A one-dimensional continuous cdf F(x) with finite derivatives f(x) = F'(x),f’(x) = F" (x), 
f'(x) = F(x) is assumed to satisfy the following conditions: 


(a) f(b) = 0, where b is the upper boundary of the distribution ; 


(6) lim B(w) = lim (+) = kexists (and also lim B(x), but in this paper mainly the upper 
atb ath xa 
boundary 6 is being considered). 


0-4. REMARKS ON THE ASSUMPTION ()) 


The value of k depends on the range of the initial distribution f(z). 


THEOREM 0-4-1. If f varies over a range with upper bound b, then 0<k<oo. If b = @, 
then —1<k<0. 

We leave out of consideration the case k = o, e.g. f(x) + — 1/log (b—~2). 
The value of k depends on the asymptotic behaviour of f(z). The following statement can be 
made: 
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THEOREM 0-4-2. If f(x) has an infinite range and belongs to the Cauchy type,* then 
-1<k<0. 

If f(~) belongs to the exponential type*, k = 0. 

Let f(x) have a finite range. If we transform f(x) into f(x’) according to x’ = 1/(b—2),a<b, 
f(x’) will have an infinite range. If f(x’) belongs to the Cauchy type, k>0 for f(x), (e.g. 
fl) = (b-2)). 

If f(z’) belongs to the exponential type, k = 0 for f(z). 

To prove these theorems we have to put 





Bee) = (5) = k- sla) dim $,(e) = 0), @) 

7 dy 
Tl iti =S ; 7 ‘ 2 
nen we may write f(x) = f(c)exp i my Aer (2) 


Further, c has to be chosen near b, so that 


[soa 


The conclusions follow from examination of the behaviour of the integral in (2), in con- 
nexion with supposition (b) of § 0-3. 
The condition lim B(x) = 0 is closely related to the following condition which von Mises 


x7“>o 
(1936) has set forth to derive the well-known limiting distribution of the maximum value 
when f(x) belongs to the exponential type 








<e(y—c) (y>¢). 





_. d 1—F(z) a 
= —_— CSC OETCIT A = — 1. 3 
rn aC ae ee as 
According to l’ H6pital’s rule, we may write further 
lim ra — B(x)) = -1, 
aro f 
in virtue of the existence of lim B(x). It follows that the condition of von Mises is equivalent 


to the supposition lim B(x) = 0 when f(z) is of exponential type. Conversely, if the existence 
«Io 


of the limit of von Mises has been supposed, it is not necessary that lim B(x) exists. 


In case f(x) belongs to the Cauchy type (the upper bound } is infinite), von Mises supposes 


that ahi 
lim A,(z) = lim mL =p>0. (4) 


Applying Il’ H6pital’s rule twice, we get 





me <a ‘. _ af’ p+2 
Pee ian eee 
Since lim —arf’/f = 1+, we have k = —1/(p+1), —1<k<0. Also, in the case of f(x) 


[a> 
being of Cauchy type, the condition of von Mises is equivalent to assumption (6), provided 
k= —1/(p+1). 
* For the definition of Cauchy types and exponential types see the paper of Gumbel (1954). In con- 


tradiction to the notation of Gumbel, we call distributions, where all moments exist, of exponential 
type, e.g. f(x) =exp(—log* x) (s>1,«>0). 
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0-5. APPROXIMATE FORMULAE FOR THE PROBABILITY OF THE EXCESS | — F(x) 


The function 1 — F(x), determining the probability of 2 being equalled or exceeded, figures in 
the formulae of the extreme values. The following expansion for 1 — F(x) is very useful for 
the determination of approximate formulae. Define 


y(x) = —Inf(z), (5) 


so that 1 — F(x) = [7 t)dt = [° cut dy, 
b may be finite or infinite. 
From repeated integration by parts may be deduced 


ia) , 2 n o n+1 
| ~ e ” _y da fy d*x dy d =o M [ _y ttle 
y 


dy ay aye ae Fay ae) * J, ar " 


Jy 
supposing that the higher derivatives d*f/dx* (k = 3,...,n+ 1) exist for the interval (a, >). 
Tl 

a dx f W@xdy 


a= — 5, aaa = — Bi). 


dy f” dydzx 


na jr-ly n—1 
Generally dx dy _ -[' oe Bix) + 4 E = dy 4 





dy” da dy" dx y |dy"—" da|’ 
dx polynomial in f(x). f’(a)... f(x) 
— = f(x) eee 8) 
aye I) Fey 
It is easy to prove that lim e-¥ wa = tin 2 -, = Oif also f’(b) = 0. 


ath dy ta 
The demonstration follows from (1) after integration and by remarking that, if 


b=, f(x) - lim p(w) = 0 (x>Cc). 


Further the values of lim ax ay y 
ere dy n dx 


First we show that 


-may be determined, supposing the limits exist and are finite. 


in d i ly 3 = 
oe dy dy"! 1dz 

: i a ai : dx dy 
The existence of this limit is obvious from (7). Further, it is clear that in ot +b = 4 would be 
infinite in case of the value of the limit of the derivative being siete to zero. This is in 
contradiction with the supposition. Therefore from (7) and supposition () of §0-3 it follows 





that 
_ ady ~,.. ad ixdy tie 
ee ag, Se 
. ad"x dy : 
en ee St | on eee 9 
so that foun dy" dex (—k)"-, (9) 
Further, we put, by introducing once more the initial distribution f(x), (see (1) ifm = 2) 
dn Cc 
dy” = [dp- -4(%) + (— k)"- 1G (n= 2,3 ...), (10) 
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while lim ¢,,_,(7) = 0. It also follows that 
xztb 
ires in lim (< “al (73 “) —0. 
y>o\ dy} \dy" dx 
‘ul for 
Hence we may write for (6) 
(5) Lire a! Fae _ Eye 
1— F(x) = f {1—k+...+(—k)™1+ $,(x) + ...dp_3(%)} 
x 
b 
+((-bsa-Fey+ [pod ae). — a 
zx 
As f(¢) > 0 and lim ¢,,(¢) = 0, the mean value theorem is applied 
ttb 
b 
(6) | [t0 Pnlt) dt = bp(En(x))(1-F(x))  (w<&,,(a) <6). 
(2,6). | Inthe following we put y,,(x) instead of — ¢,,(E(a)). 
Finally, it may be deduced from (11) for k> —1 
sl 1 [pine +k) [Pi(@) + .-- + Pn—a(@)}) 
bs Finke — 98 eel eee no” 12 
" (F 2l+k 1—(—k)"+ ,(@) 4 wah 
(7) lim y, (x)= (n>1). 
ztb 
(8) In case k> 1 we may not consider an infinite series (n > 00) or n even if k= 1. Formula 
(12) also holds good if k = 0. 
In this paper mainly the case n = 1 is considered, so that 
; 1 f) 
1h oe en SL 13 
P= — Ege (7),” si 
we) = — (a) = (F) _—k (imye) = 05 Ele) = Ee). 
' \ v=&(xr) xatb 
finite. Remark 1. In the case of the normal distribution f(x) = (27)-+e-**’, for which k = 0, the 
, expansion (12) is identical with the well-known asymptotic expansion, obtained by direct 
repeated partial integration, 
a ee. : 1 1.3, n—1 1. 3.5. ...(20—3)] gas 
—Fia) = ——— 1-5 4+ +... +(-1 — ete? 14 
sare | aed a heh ai Maa 
4 : OD sense 2 — 
3 1s In while y,,(x) = Le me zh (x< &(x%) <0). 
lows | § 
It is well known that in this case the error committed in stopping at any stage in the 
asymptotic series is less than the next term in the series. 
(9) Remark 2. According to (3), (13) and k = 0, for the function A,() of von Mises may also 
be written (zx) 
A,(x) = —-————__ (li 1) = 0 
(10) so that it appears again that the condition (3) is closely related to the condition lim B(x) = 0. 
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1, THE MAXIMUM VALUE OF UV, .,, (x) 


1-1. The formula for M,, .,(x) 
Let x denote the mth value from the top in a sample of size n. For its probability element 
My, m() dx we have 


— I) 7 
My m(e) =m (0 ,) LPO —Flerf2. (15) 
Analogously for the probability element L,, ,,(x) dx of the mth value from the bottom 
1 
Dnym(t) = (0 4) Fey" Fey fle) (16) 


In case of m > 4n we consider the frequency distribution of the (x—m)th value from the 
bottom. 
Putting m = 1, we have 
M,,(x) = nf(x) (F(x aS (15a) 
“ee = nf (x) (1—F(e)}". (156) 
For approximate evaluation of the factorial in (15) and (16) the Stirling formula may be 
applied. If n, m and n—™ are sufficiently large we may write 


ee " eyes n—m n—1 J (17) 
m—1, a n—m 2m(m—1)(n—m)} ’ 


eee 
Pan ims 2n 12m 12(n =a ‘ 


If nis sufficiently large and m sufficiently small, it is preferable to maintain (m — 1)!. In this 
case the approximation can be deduced by means of the Stirling formula 








n! a m*—m — 2m? + 3m? —m 
—-—— = n™exp| — — 
(n—my)! P 2n 12n2 


Here the usual expansion 


( n es rn ee 


(18) 


has been used. 
In this paper the further treatment of the function Z,, ,,(7),m < 4m, is left out of considera- 
tion, because it is analogous with the one of M,, ,,(x). 
Further initial distributions f(x) are considered satisfying the conditions of § 0-3, supposing 
that are B(x) = k is not equal to — 1 or 00; for the rest b may be finite or infinite. 
zZTo 


1-2. The modal value Up m Of My, m(%) 


The modal value x = Un, m tor which M,, ,,() has a maximum value, satisfies the equation 
n—m m—1 : 
at (5) =0, (19) 
F (Un, a ey 1 — F( (Uy, a) | i Un,m 


which may be found by differentiating M, ,,,(x) at (15), with respect to x and equating the 
result to zero. 








1 
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We may write for (19) 
m—1 ad ‘ 
re 
name Now the following theorems can be stated: 
(15) THEOREM 1-2-1. For every n> an assigned n, one modal value uw, ,, of the frequency 
distribution M,, ,,(~) lies in the interval (c, b) (c sufficiently near 6). If n increases and m is 
supposed to be fixed, w,, ,, increases also, while lim w,, ,,, = 6. 
Conversely, let n be fixed and m a variable, then w,, .<Un,m,< Un(m = 1) for my < mz. 
(16) Proof. An interval c <x <b exists, for which the function Ff’ /f? is negative and steadily 
2 ’ 
1 the decreases to —0o, if x increases from c to b. For (‘; ) =f (+) + i > 0, if x is sufficiently 
large. 
It is obvious that a value n, > 0 exists, for which wu, ,, = c, so that 
l5a : "\ 
15a) oi eh ik m1 +(rf) =o. 
(15) 1-Foe VP 
vy be Let n>, and m fixed. Then the equation (20) has only one solution u,, ,>c. For Fe) 
and ( - Fr) steadily increases to +00 if x + b. The statements concerning the behaviour of 
- xz 
(17) Un.m by fixed n are immediately evident. 
THEOREM 1-2-2. For sufficiently large n and small m, the modal value u,,,, satisfies 
approximately the relations 
nthis tg n ” 
()..- me o> -™. ine 
m(1+k)—k , 
(18) 1 — Ftp m) ai n(l+k)— (k> ve 1). (22) 
If f(x) is of the exponential type, the formula derived by Gumbel (1935) is found 
m 
= = — 2 
1—F(uUy, m) . (23) 
| Proof. In virtue of (13) we may write for (20) after some calculation 
idera- | (2) 0 es) ae —1)(1+k+y(u (24) 
Pun Lk Wu) (m—1) (1+ k+y(u =r , 
posing fy 
' whereas lim y(x) = lim (,:) —-k=0. 
ato sto \f'/2=E2) 


For sufficiently large n, we may write 


nation (fi) og 8 n(1+k)— i! 1+(m—1)(1+&)[n(1+k)+1- 








aM w), (mer) 








f? ~1+km(1+k)—k (+ & [m(14 k)—k][n(l+k)—- 
(19) | 1 nl+kh—- m—1 
Tea ml +hy—kY w aie ahi 
ng the _ _ml+k)— 4n _ 
ee eee "ll -aspereca seiiitide 


29 Biom. 45 
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As a result of eliminating (f?/f’),,, ,, between (13) and (20) we obtain 


_ (m—1)[1+k+y(u)]+1 
1—F (Un, m) = (n—1)[1+k+yY(u)]+1 


m(1+k)—k (m—n) 
* ai+b— Bl teense iin(l +4) — qv): 


The formulae (21) and (22) hold good if the terms with (wu) may be neglected with respect 
to the unity and | k|<n(1+h). 











(25) 


dM ym (X) 


Remark. With the aid of (24) and (25) wemay show (= de 


=)) < 0, ifnis sufficiently 
ZL=Un,m 


large. Then M,, ,,(u) is a maximum. 


1-3. The behaviour of Uy,» in case of increasing n or m 

We suppose 7 and m to be continuous real variables. Let n be sufficiently large and m fixed, 
but small with respect to n. Then we deduce from (20) by applying the formulae (13), (24) 
and (25) sin index n, m has been omitted) 


a {n(l+k)—k}P | my(u) n 2 
dy ~ 1m La 1+k)—k}(1+h) + mam to (RY) 3 (26) 


f ) 
v( ) (; u={(u) 
If the quantity (uw) may be neglected with respect to unity for sufficiently large n and 





| k|<n(1+k) du 
m1+k)—k 
dn~ fluynei+ky ©? 1) i 
, 4 du (f? 1 
or, according to (21), _« (5 i). (ml +k)—k]Q+h) (27a) 





im J, 7a = r exist. Then we may easily show that r = 0 in two cases, namely if b is finite 
ztb 


and f(b) = 0, or if b = 00 and values s > 2 exist, so that a*f(x) > 0 for x > oo. Further r = 00 
if b = o and values s<2 exist, so that x*f(~) > 00 for x > oo. In certain cases likewise 
fi(x) = p*a-*, we have transitions, then r = }p?. Hence we may state: 


THEOREM 1-3-1. If nis sufficiently large and k > — 4 and therefore f(x) tends to zero more 
rapidly than f(x) = p*a-*, du/dn is a positive steadily decreasing function by increasing 
n, while lim du/dn = 0. If —1<k< —}, so that f(x) tends to zero more slowly than f,(z), 


n>@ 
du/dn is a steadily increasing function, while lim du/dn = 00. If k = —} it is possible that 
n>o 
limdu/dn is finite or o0 (e.g. if f(x) = (alog%x)-1(q>0), this limit equals zero; if 


n> oO 


flx) = =e  (q > 0), this limit is infinite and for f,(z), p2/(m+1)). 





Similarly, for a fixed value of n and variable m > 1 follows (see (20) and (25)) 


du du{(n—1)(1+k+y(u)) +1 
dm ~~ dn SSeS Ermita 


The formula for du/dn is given in (26). 





(28) 








fo 


24) 


(28) 
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The approximate formula for du/dm is (see also (21) and (27)) 


du ¥ 1 
= (+) sacHnt (n fixed, k> —1). (29) 
It may be easily shown, that if f(x) has a finite upper bound, lim f/f’ = 0(k>0). Further, if 
xtb 

f(x) is of the Cauchy type, lim f/f’ = —co(—1<k<0). Hence: 

THEOREM 1-3-2. The function du/dm is negative for wu sufficiently near b. If f(x) has 
a finite upper bound, lim du/dm = 0 and equal — oo if /(x) is of the Cauchy type. However, in 

utb 
case f(x) is of the exponential type, —0o <limdu/dm <0. 
utb 

1-4. The maximum value of M,, (2) 

From (15), (24) and (25) it follows after some calculation that 





anni -(5) (Co) p-S=2 Sef ostsvoy 


Then f’/f is introduced instead of f in virtue of the fact that in the practical applications by 
increasing 2, f’/f changes more slowly than f. 
If p?/n, m?/n and (m—>p)/n are sufficiently small we may write for (30) (see also (18)) 








M,, m(tt) = — (). aT e-m+p (: -2)" (1+k+y(u)) (1+ fy) (31) 
fy "P +o(4) +0("), (44 <€) 


Finally if we suppose that n (and therefore w) is so large that | y(u) | <1+k, we write 
for (31) : 
M,, »(u) = — f (1+k) mshi} y — asB) O + fg) (32) 

_ fr, (m—1)! m(1 +k) 3 


k)?—k? — 2k 
Pat + VW) gage TOW). (Ma<6) 





In the case of the extreme value itself (m = 1) follows from (31) 


(tu) = — (5) e“H47(1 + py), (33) 


1+ 2 4 
ww? +0(5), (uy <6). 





n2 


If ~, and “4, may be neglected, we may write 





My, m(U) ® — (G).a +k) ma f she ws 5) uo (34) 
M,,(u) x — (5) ew. (35) 
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1-5. Example 


As an example we determine approximate formulae for the maximum of the extreme 
values in case of the normal distribution f(x) = (27)-4 e-***. From (13) and (14) follows 





1.2, 30 1 1.3 1.3.5 
a += 4 = ~,,.: =—-— —... 36 
(2) a2 a bi a6 te | p(x) a2 at 7 6 ? ( ) 
so that in virtue of (35) 
1 3. 15 
M, (uw) = wexp(-14.5-44+5-.--). (37) 
According to (24) the following relation exists between u and n 
' 3 15 
n= (2z)3 wes“ +—— a 78 (38) 


To judge the accuracy, we have computed the values of u and M,(u) from the various 
approximations which follow from (37) and (38) by omitting the smaller terms. 

For comparison the exact values of uw and M,(u) have been determined in the case 
n = 39%(s = 3,..., 10) from the graphs of Figs. 2 and 3. These graphs also show the behaviour 
of M,(x) according to the exact formula (15a). 


A sufficient agreement appears to exist between approximation values of u, determined 
from 


n = (27)tued™ (38a) 
and the values determined from the exact graphs. 
Concerning the values of M,,(w) computed from the various approximation formulae, the 


deviations are more important. In Table 1 are mentioned the values of M,(u) computed 
from 


us|’ 


(a) My(u) = mflu) Fwy, (0) My(u) = wexp| - 1455 * 


(c) M,(u) = wexp| ~ 1 +a ; (d) M,(u) = wer. 


1 3 
(e) M,,(u) - wexp| 1+ — sual 9 


Further have been mentioned the values of the neglected term jp, ~ (2n)-!. 


Table 1. Normal distribution of u, M,(u) and pg 








| ] 
| 
n | u (a) (b) (c) (d) (e) Ps 
| | a | a SN APR 
33 1-87 | 0-83 0-71 | 0-91 0-69 0-81 0-02 
34 2-3 0:97 0-92 1-02 0-85 0:97 0-006 
35 2-68 1-09 1-07 1-13 0-99 1-10 ae 
36 3-02 1-21 1-20 1-24 1-11 1-22 — 
37 3°34 1-32 1-31 1-34 1-23 1:33 — 
38 3°63 1-42 1-41 1-44 1-33 1-43 — 
3° 3°9 1-52 1-51 1-53 1-44 1-52 —— 
310 4:16 1-61 1-61 1-62 1-53 1-61 —_— 
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The values according to (d) appear to differ too much from the exact ones. A difference 
smaller than 1 % will be obtained if w > 10 and therefore n > 3“. If n > 3’, we may apply (c), 
while (b) gives sufficiently accurate values for n > 3°. A further approximate formula of this 
kind, e.g. 
M ti 1 1 3. 15 
n(u) = wexp | — a se ae 
is still more unfavourable for small wu, because the applied expansion has an asymptotic 
character. According to the table the formula (e) in which the coefficient of u-* has been 


taken appears to give the most favourable approximation for n > 3°. 
If we consider the initial distribution 


f=ce® (c,d>0), 


for x>a, it appears that k = 0 and y(u)=0. Then w and WM, ,,(u) are determined by the 
formulae (see (24) and (34)) 


du i m™ —m 
€ and Mal) = 4 in* , 


provided pz ~ m/(2n) is sufficiently small with respect to unity, see (33). 


1-6. The behaviour of the maximum value of M,, (x) for increasing n or m 


At first we suppose m fixed, n variable and sufficiently large and m small with respect to n, 
so that (34) may be applied. 
Then we write shortly 


My mu) = — (5) g(m,k), g(m,k)>O0 (m>1,k>-—1). (39) 


In virtue of the behaviour of f’/f as x increases, we may state: 


THEOREM 1-6-1. If f(x) has a finite upper bound, lim M, ,,(w) = 00 (k>0). If f(x) is of the 
utb 


Cauchy type, this limit is zero (—1<k<0). However, if k = 0, and therefore f(x) is of the 
exponential type, lim M,, ,,(w) = zero, finite or infinite, depending on f(x). In this case: 
uUu>o 


(a) Let f(x) tend to zero as rapidly as, or more slowly than 
ea” = (r<la>0). 


Then lim f’/f = 0 and therefore lim Y/,, ,,,(u) = 0. 


«7a uUu> ao 
(b) Let f(~) tend to zero as rapidly as, or more rapidly than 
ea" =(r>l,a>0). 
Then lim f'/f = 00 ee Sk) = 00. 
(c) If f(x) tends to zero like 
—lm™e-™ 


exp {lx + Q(x)} (l< 0, lim Q’(x) = 0), lim f =I and lim Y,,,,(u) = (m—1)! * 


«2> oa 
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From (15), (19) and (25) follows for variable n and m fixed 

din M(u) d —1\]dn 

oe) = |In rae ne We 
- 2 2 

a k m(1+k) v(u) ov ] dn al 





wept mene + in(l+k)—k] +h) m \du’ 
while dn/du has been determined in (26). Then we have also applied the Stirling asymptotic 


series for the expansion of Inn! and the well-known expansion for 


— ee 
ar eS 9 Sa: 





Further let |k| <n(1+k). 
From (21), (26), (39) and (40) follows immediately: 


THEOREM 1-6-2. (a) Let f(x) have a finite range (k>0). Then for large m and hence u 





sufficiently near b dM(u) (f’\? 
_—* (5) tom, k)>0 
dM (u) 


whereas for n > oo and thus u + b, lim = 00 (we have omitted the index n,m). 
utb 


(b) If f(x) belongs to the Cauchy type (—1<k<0), then 











ie <0 and lim oe) =0 
du con a , 
(c) If f(x) belongs to the exponential type (k = 0), in virtue of (21), (26), (34) and (40) 
follows dM(u) f m™e-m 
qu Vs (u i[a+(Fa) v4 vu) re (41) 


For the normal distribution it appears that for large values of w 
dM (u) id me 
du -— (m—1)! ° 
Then in virtue of (36) we have put (uw) x u-®. This result may also be derived from (34). 


Analogously we may formulate a similar theorem about the behaviour of 1, ,,,(u) if m is 
variable and v is given, using the following formula (n is fixed and m > 2 


din My, m() _fd n—1 1—F(u)| dm 
=e =)" Lami” (pa) +8 Fu) IF du’ () 


The formula for dm/du is given in (28). 





2. APPROXIMATE FORMULAE FOR WV, ,, (x) 
2-1-0. The first approximate formula 


From a practical point of view this formula may already be applied for values of n, which 
are not large (e.g. n = 50). 
Instead of (15) we write 


My y(t) = 0(~ 1) f(e){1— Fle)” exp {—(n—m-+1) (1 Fa) LL Ale,n—m+ 1) 
(43) 











f 





40) 


. 


otic 


eu 


(40) 


(41) 
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If | B(z,n—m-+1)| <e for b>x> x, » m4, (the upper bound may be finite or infinite), we 
obtain the approximate formula 


My m(2)=n ("1 ) fe) Fay exp{—(n—m+1)[1-Fleyy. (4 


With m = 1 the corresponding expression for the initial distribution of the extreme value is 
M,,(x) =nf (x) exp {—n[{1 — F(x)}}. (45) 


If m3/6n? is sufficiently small, we may approximate the factorial in virtue of (18). So the 
following approximation is obtained for (15) 


—m?+m 


My, (2) = aT exp _ |p [1—F(x)}"“ exp {—(n—m+1\[1—F(x)]}. (46) 





Finally if m?/2n is sufficiently small we may write 


My,m(®) = Gee yif@) A — Fle) exp {—n[1 —F(@)} (47) 
Then the modal value w(or uw, ,,) satisfies the equation—compare (20)— 
f' m—1 ‘i 
(73), - ite **=* Ai 


2-1-1. The interval of application 


In connexion with the neglect of the function A(x, n —m + 1) in (43), about the behaviour 
of this function two theorems are formulated. 

To facilitate the practical meaning of these theorems, in Fig. 1 graphs are reproduced 
showing the relation between f(x) and 1— F(x) in case p = n—m+1 = 6,8,..., 50 (see the 
formula (48)). 

From (15) and (43) it follows that 








In [1—f(x,n—m+1)] = (n—m+1)(1—F(x))+(n—m) In F. (48) 
In virtue of In F=ins— , v= qr deduce 
- ~ F(x)\3 
In(1—f) = ae [2—(n—m+1)(1—F(x))]—3(n—m) (Fa) (1+), (49) 


__—(L-F())P_ eg B(L- FP 
5(1+ F(a))? —23(1— F(x))* 20F(x) ~ 
THEOREM 2-1-1-1. (a) Avalue, ,(p = »—m-+ 1) exists for which A(z, ,) = 0. The value of 
F(x,,,) may be deduced from the equation 
(n—m+1)(1—F(x))+(n—m) In F(x) = 0, 
by means of the well-known expansion in Lagrange series. Then follows 


2 8 1 28 1 1 


n—m 3(n—mp* 9 (n—m)® (n—m)?*’ (50) 








i— F(x, ») = 
(b) The function A(z, p) is negative for x, , <2 <b, but for all values of x< x, ,, A(x, p) is 


positive and a steadily increasing function as x decreases. 
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Fig. 1. The relation between f(x) and 1—F (x) for small values of p. 
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(c) For a value v,, defined by 
1 
1—F(v,) ——— (b<v, <2, »); (51) 
A(x, p) has a minimum value. 
From (48) follows 
1 1 6 


Bey) = — 55am 41 6Qn—2m+41)*~ n—2m+1) 








(0<6<1). (52) 


(d) Finally, let x be given and let »—m increase. Then A(x, p) increases likewise, while 
Ble) < P(e, p) <1. 


THEOREM 2-1-1-2. (a) Let p, = n,—m,— 1 bedefined by A(v,,) = —é;é> 0is the maximum 
value of ¢ which may be neglected with respect to unity. Let furthermore p>, and 
Xq,p < Xo,» be determined by f(x, ,) = € or by the approximate expression 


1—Flqy) = 5+ [1—2pln (1-2) (p =n—m+}). (53) 


Then we have | A(x, p)| <é, if x,,,<x<b. Consequently for this interval the approximate 
formulae (44) and (47) hold good. In virtue of (53) for large values of n and small m the 
lower boundary value of M,, ,,(x) is approximately (see (47)) 


My nl gap) eae [1+ (1+ 2né)t}"—4 exp [—1—(1+2nz)4], (54) 


(b) Let p<p,, so that | A(v,)| >é. 
Then | A(x, p) | <é for two intervals (x, ,,%,,,) and (%,,,6), whereas 





Xq, Pp 


on B(Xq,p) =e é, A(z, p) = B(%1,p) = —€é, 


< Xo,» <%p,p < Up <M p<, 
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Proof (a). According to theorem 2-1-1-1(c), #(z,p) has a minimum for x = v,. Hence 
| B(x, p) | <éforp>p, anda, ,< «<b. Moreover, x, , <,,, in virtue of theorem 2-1-1-1(a). 
The approximate formula (53) may be determined from (49) by assuming that x is 


sufficiently near b, so that approximately 


2In(1—€) = [1—F(2,,,)][2—p(1— F(z, p)1. 
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The validity of the corresponding statement in (b) is also obvious in virtue of | A(v,) | > 
for p < p,. Then the interval (x, ,,,b) is separated into three intervals (x, ,, 25), (%s,p» %,p) 
and (2,,,,6) whereas | f(x, p)| <é for the first and the last interval, but | f(x, p) | > for the 


interval (x, ,,, %,,) (see Fig. 1, case » = 8). 


2-1-2. The interval contains the modal value if m?/n is small 


(x) may be calculated by (44) or (47), if the modal value 
b). Therefore the required condition follows from 


The maximum value of M,, ,, 
Un,m 18 situated in the interval (z, 
1—F (ty, m) <1—F(&q,p)- 
We shall suppose that | 4(v,) | <é, so that 1— F(x, ,,) is determined by (53). 
Further 1 — F(u) (we omit the index w,, ,,) follows from (25). 


If y(u) is small m<n and | k|<n(1+k), we may write 


q.p? 





1—F(u) = 7 (m )- v(u) “= 11 —2pIn (1-2) 





n\ 1+k) n(1+k)? pp? 
ie 
of pee 2(n —m) él. 
or 1 Plu) <> +7 (1+ (n—m) €] 
The condition, Up, > %_q,»» has been satisfied if 
k y(u) 9 at : = 
TF ae tee , Sere +t. (55) 


In general we may conclude from (55), that it is necessary for m?/n or m?/(2n) to be sufficiently 
small. 
For the initial distribution of the extreme value (m = 1), it follows from (25) if k = 0 


1 
1—F(u)x 5 —y(u)). 
Therefore from (51) and lim y(u) = 0, it follows that the difference between u and v, 
utb 


decreases with increasing n. 
In the case of the norma! distribution (see 36) 


F(u) — F(v,)® —— x-- 


so that the maximum value of M,,(x) may be computed by (45) even for small values of 2:, 
e.g. n = 25, or smaller. 

As another example we mention the distribution f(z) = «~*. Then 1 — F(u) = 2/n, so that 
in virtue of (50) uxa, ,, while A(x, ,) = 9. 


2-2. The second approximate formula for M(x} 
In this section the method for deriving approximate formulae is different from the one of 
the next section, which is based on the development of the function 1 — F(x) of § 0-5. 





462 Approximate formulae for the statistical distributions of extreme values 


The starting-point for the derivation of these formulae is (47), supposed that m?/(2n) is 
sufficiently small. This formula holds good for x, ,<a<b (see §2-1-1), while this interval 
contains w,, », (see § 2-1-2). We may write 


M bat 
(x) _ fe) ‘ Pe) reer exp {—n[F(u)—F(a)}} (58) 


M(u) fu) 


and further 





(u)— F(x)" _ 1+v/(1+) 
pe | = exp|(m —1)In reared 
- s yp \2p+t 
= exp|2( m—1) ) =, opti (75) | (m> 1) (57) 
where we have put _M(u)— F(z) _ F(u) — F(x) 








2—F(u)—F(z) 1+v’ °~ 21—F(u)]’ 
It can be shown that | v/(v+1)|<1 if a<b. 
Hence in virtue of (47a) and after some computation we may write 


M : 2p+1 
(2) =F exp I(&)., [F(u) —F(x)]— 2(m—1) + 2(m -0 ¥ sala) 9 | (58) 


For b>x>x,, we often consider an approximation taking some terms of the series. 
Now we derive a further approximate formula for (58). Then we put f(x) = e* and 
introduce the following series of Taylor, which holds good for |x—w| <6 
h(x) —h(u) = h’(u) (w—u) + $h"(u) (w—u)? +... 
hu) 
(r—1)! 





M(u) fu) 


it 





+ (c—uy43+—>(x—u) (h'=f'/f). (59) 


Further, we put 





; | me Diu i 2(n 
F(x) = F(u)+f(u)(x—u)+... +> = ) e— uy +——_—__— =>) D te «yt (60) 


(u<{,y<a or «<d,y<u). 


After substitution in (58) we find after some computation 








M(x —u)? —u)? as 
Mia = exp|a, a Penh me +... 40,09) wf (61) 
Then it appears that 
=((f) _f)__m-) | 
™ (7) (5) GFP], (62) 


2-6) Gye MREPEL a 
We may expect that a, < 0, for in virtue of (1) his 
() EN - 
Further in virtue of (13) 


2 
oa = +k+ uP ( 


~(£) +e ; Me 
(FG) 1+ k—es(wy) im gy(0 


fY «a: ™ 
*) (im yu) = 0), 





There 


In ca 


Tn vi 


n) is 
tval 


(56) 


(57) 


(58) 


and 


59) 


60) 


61) 


62) 
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Therefore for sufficiently large x 
a= —(f'/f)2(1+k)[m(1+k)—k]<0, (m>1). 


In case of the initial distribution of the extreme value (m = 1) the coefficient a, equals 


a hers ° 


Example. According to (45), for the normal distribution we may write if x >, , 








M,(x) = (2n)-4nlexp (32?) -n|4- (2n)-4"e4 ar). (64) 
0 
In virtue of (38a), (61) and (63) it follows that 
M,,() aS 2 ay\2 vad — (~—u)? 
U,(u)~ exp|—1(w x) (ax — wu) tua — 1)?-1H,,_,(u) = . | (65) 
Then H,(u) are the well-known Tchebycheff-Hermite polynomials (see Kendall (1947), 
aiatats H,(u) = v?—-1; H,(u) = u-—3u; H,(u) = ut—6u? +3; ete. (66) 


Practically, the formula (65) has been applied for the cases n = 3°, s = 3, ..., 8, considering 
the terms with H, and H, of the series. Further, we have applied the formula (e) of §1-5 for 
the computation of M,,(u). The results of these computations are shown in Fig. 2 in compari- 
son with the graphs computed from the exact formula (15a). Then it appears that the 
differences between the approximated graphs and the exact ones are sufficiently small. Only 
in the tails are the deviations more important. 


2-3-0. Approximate formulae for M,, ,,(x) based on the formulae for 1 — F(x) of § 0-5. 

The starting-point for the deduction of this formula is again the formula (47) which holds 
good for the interval (x, ,,, b) (see § 2-1-1). Further for 1— F(x) the formula (12) is applied, 
supposing that the initial distribution f(x) satisfies the various conditions required for the 


deduction of this formula. 
We suppose that the series } ¢,,(x) converges or will be an asymptotic expansion. In this 
n=1 


last case if x is fixed, the terms decrease for n = 1,...,p. But later on (n = p+1,...) they 
increase rapidly and without limit. This fact can be applied for approximation of 1— F(z), 
but it is not possible to demand any degree of accuracy whatever. The degree of accuracy 
cannot be improved below the value of the last considered terms of the series. 

In this case may be written (see (12) and (47)), if a gl is as small 


n= (a) tint GY oem el 
Wg: ane zal (7),2+ bios rote  —- 


In practical applications the case p = 2 is important. Then see (1) 


éi(a) = - (4) +& (img) - 


i) 





Let | Po(x)| <é|A,(x)| or (see (10)) € 
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Fig. 2. Exact and approximated graph for the normal extremes. 
——, M(x)=nf (x) F(x)", Pe)= | f(x) da. 
-oO 


Approximate formula: 








M(x) _ w+) u a ae. Fe a d 

aa p| - FE (eye % (ut 1) (e—u)?— (wt Buy(a uy |; 
1 3 

sis Blea , 


Let further | Y(x)| =e(1— if 
Then it follows from (67) 


ja Pa = - (FJ e| al (7) [»- (4) |. (68) 


2-3-1. Example 
Again the normal distribution is taken into consideration. Then from (68), with k = 0, 
follows 


My, m(x) = ((27)-* ny” 


Lop <%<2<b and |k|<1. 





My, mix) = 


1 1 41\"-1 a2 ss : 
wana) mo[-me-eraG-2) or -F] 


In virtue of (38a) we may further write if x, , <x <0 (see 2-1, 2) 


rn Di\z 23) SXP|™ 9 x 23)°*P 2 |: 


My, n(x) = 





(69) 














Th 


so 1 











58) 


39) 
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The corresponding formula for the initial distribution of the extreme value itself is 


2 2 2_ 72 
M,,(x) = wexp = a (: = “) exp | : (70) 


so that M,,(u) x wexp —1+ | > 


319~ 59,000 






3° ~ 19,680/ ... 














1 1:5 25 3-5 4 
Fig. 3. Exact and approximated graphs for the normal extremes. 





, Exact graph (see also Fig. 2). 
Approximate formulae: 


i uz—x22 u 1 uz — x 
----,/1(x%)=u exp “jee i-S exp 5 : 


2_ 42 1 8 2__ 42 
+++, M(e)=uexp|™ =-£(1-5+ga)er" =|: 














2 x 


oe, M(x) = wu elule—u)—etl@—4)) 


Tn § 1-5, this formula has been marked by (c). By the numerical application it appears that 
(c) may be applied if x > 3’ and a deviation smaller than 2 % will be obtained. In Fig. 3 the 
graphs of M(x) are shown in case n = 3° (s = 4, ...,7), computed according to (70). In this 
figure the exact graphs of M,,(x) computed from (15a) are also shown. Then it follows that 
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with increasing n the deviations between the approximated graphs and the exact ones 
become smaller. 
Especially for smaller values of n the formula 


uz—a2 wu re Ss uu? — 22 
M,,(x) = wexp| 5 -E(1-4+5u) exe 5" | (71) 





x 


gives a more favourable approximation, in virtue of the fact that in this case M,,(u) becomes 
identical with the formula (e) of §1-5. However, for these small values of n < 34 (65) gives 
a still better approximation (see Fig. 2). 

For the larger values of n the differences between (70) and (71) become continually 
smaller. 


2-4, Limiting formulae for M,, (x) 
2-4-1. Preliminary 
We deduce approximate formulae for f(x)/f(u), (f’/f),, and F(x) — F(u). 
In virtue of (f/f’), = k- 93(2), | lim 1 B1(2) = 0, it follows by integration that 


roo Sse», La 
or Fay = (ey) exp| | x) = 1+k(Z) (@—w), (72) 


fw) i), 
(Z) [swe 


E w)-(5). [su0 = 


‘a L) [sae 
().- @) ae re * bx(t) dt = (5) Se si 


Furthermore, in virtue of (72) 
F(z) —F(w) = fw) |" {e1e)"*exp| ['aewray|] a 
=F) eq teeonmm*—pexp| [acy dy] 


U<U(%) <2, 
In virtue of (24a) it may be deduced that 





if Aly) = 





Similarly, we derive 








m(1+k)—k 


nl rs “ash (l— — Zk+Dik) (74) 


Fe)- Flu) = [1+ ye) 4 ES wy] Mate 
y(z) = —1+exp If Bway}. 


It has also been assumed that the product of small terms of order y(x) y(w) may be neglected 
with respect to y(x) and y(u) themselves, and that |k| <n(1+k). 














or i 


if yp 
(72 
ter 
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73) 
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2-4-2. The limiting formula for M,, ,,(x) in case f(x) is of Cauchy type; the formula of Fisher- 
Tippett 


As to the fact, that in practical applications — (f’/f), decreases more slowly than the func- 
tion f(x) itself, when —1<k<0 and c<x<0o, we write for (48) (the index n, m has been 


omitted) 
= iPS ch f r m—1 p—nll—Fi(a 
M(x) a4 mG). (F). [1— F(x x)] é nb. mM, (75) 


or in virtue of (13) 


M(x) = - men (5). (14%) [14 ve vo |u- F(a)]™ e-"0-Fean, (76) 


(lim y(x) = 0). 
xtb 
From (76) it follows that 





M(x) (f)\ (f F(u) — F(x) e-niF(w)—Fee 
ay” (7). 7),|!+ 1—F(u) . alll a ti 
sha i+ (x) — yu) 





if y(x) and y(u) are sufficiently small. After substituting the expressions mentioned in (25), 
(72) and (74) in (77), we deduce after some computation, neglecting the products of small 





t 
a eee A(x) 2A+4m— tik exp [Cte ky 1 — geen], (78) 
m Bs 
staff) eon 
Aa) = (1-+8(2) + PFE) (1+ p(e) (1-H exp[p(a) (aH, (79) 
k)—k 
pla) = y(2) + ve) py(e) = a) + Mw). 


It has been supposed that m*<n,|k| <n(1+k). 

For x > wit appears that z > 0in virtue of k < Oand (f’/f), < 0. If, however, x, ,<%<a, <u, 
zmight be negative. This case must be excluded, by choosing the lower boundary of the 
interval |—w|, for which (78) holds good, larger than z,. We remark that lim f’/f ='0. 


“2a 
From (78) and (79) follows: 


THEOREM 2-4-2-1: Let f(x) belong to the Cauchy type and satisfy the conditions of § 0-3. 
Then an interval (x1, 22), %p,q<%<UW< 2%, <0, exists for which we may put 


Ma gim(k+1)-kYk ex xp| - coe af” avon], (80) 
2=1+k(F) (ex—-u)>0, —-1<k<0, -o< tt <9, 


Eventually (f’/f),, may be approximated by —nf(u)/[m(1+k)—k] (see (21)). The formula 
for M(u) has been mentioned in (34). After some calculation and application of (34) we find 
for the distribution of the excess 


ff anceyae = 1—foxp[ 2H —E ro (PSE [LH MEP sas) (gy 
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If we put k = —1/l,1>1, (80) transforms into a more usual form 


Wag = zm1-)-lexp |(-m+55)) (it 0), z= 1-7 (5). @-» (80a) 


Let |(f/f’)2-—k| and therefore |¢,(x)| and |y(x)| decrease to zero by increasing 
x (¢<2%<oo). Then in virtue of the definitions of f(x) and y(x) (see (72), (74) and (79)), it 
follows that at the boundaries x, and 2, of the interval for which (80) holds good, M(x)/M(u) 
decreases if n increases. 

The formula (80) is a limiting expression for M,, ,,(x) if n > 00. For m = 1, (80a) corre- 


sponds with the well-known limiting form of Fisher—Tippett. 





Remark: If f(x) = ax, l> 1, a> 0, ¢,<% <0, (80a) is an exact formula, then z = 2/u. 


2-4-3. The limiting formula of M), (x) uf f(x) has a finite range 
In an analogous way we may derive a corresponding formula for M(x)/M(u), if f(x) has 
a finite upper boundary and 0<k<oo. In this case we may write (see § 2-4-1) 


4 d 
fS-[-L aa tmal “ 


For the deduction of the corresponding limiting formula of M/(x)/M(u) we also may follow 
the method of 2-4-2. Then (80) may be derived again, but we have to put for z 
_b-z 
~ b—u" 





(83) 
E.g. (72) transforms into 


'b 
fe) = (P= 2)exp| [awrav|, eu) Ko—n)[ko—2)— [dyna], 





In virtue of the approximations requisite for the deduction of (80), this formula may be 
applied for a sufficiently small interval | w—« |; if f(2) = (b—2)*, (80) is an exact formula. 


Further on we may remark that for k > 0, lim (4) = —0. 


u> a Fim 


2-4:4. The limiting formula for M,, ,,(x) of f(x) ts of exponential type; the formula of Gumbel 
Now the distributions are considered for which k = 0 and the range is infinite (b = 0). 
Therefore we take the distributions f(~), which belong to the exponential type. 
The formula (72) also holds good for k = 0. Then z(x) = 1 and 


lim (z(x))"* = exp (5). (x— w)| P 


k—>0 


Analogously to (73) and (74) we may write 


f’\ [? 
i) (ff (F) Jae | ip : 
Gh-O.fttema| Geren 
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F(x) —F(u) = f(u) [fatex (4). ! -u+ [rw ay)} 
= =\|- 1+exp (5). («—w)|| ! + mn v(u)| [1+y,(2x)], (85) 


Yi(x) = —1+exp (5). [axa ay] (v<p(x%)<u or x«>p(x)>u). 


Uu 


After substituting the expressions (84) and (85) in (77), provided k = 0, the following 
theorem may be formulated: 


THEOREM 2-4-4-1. Let f(x) belong to the exponential type and satisfy the conditions 





of § 10-3. 
For x and u>x,,,, we may write (the index n,m has been omitted) 
Ma = p(x) exp [ — mt —m(e— 1)], (86) 
M(x) = [1+ 9, (x) + W(x) — (u)] [1 + 9(x) (1 —e') J" exp [ 9, (x) (l—e~*)], (87) 


t= —(F) ws ga) = Wwtyla); ale) = (m1) Yu) + my, (2). 


\y JU 


Again we have neglected the term (uw) y,(2). 
If 1 < (x) < 1+ for sufficiently large x and | u—« | sufficiently small, M(x)/M(u) may be 
approximated by 


Ma) = exp[—mt—m(e*-1)], t= — (4) (w—u). (88) 


iu 


According to (21), approximately we may replace (f’//),, by —nf(u)/m. In virtue of 


Mu) = - (5) _ im" —e-™ (eee (34)), 





Sf] y(m—1)! 
i mm : . 
M(x) = -(5 —exp[—mt—me]. 9 
M (a) (5) wa yi exPl mt —me] (89) 
This is the well-known limiting formula for M,, ,,(x), deduced by Gumbel (1935). 
In virtue of the supposition that | ¢,(x)| = | (f/f’)’ | and therefore also | (x) |, decreases 


to zero in the interval c < x < 00 for increasing x, at the bounds of the interval (x, x,) for which 
M,, (x) may be approximated by (89), M,, m(x)/M,,(u) decreases if m increases. In this 
sense (89) may be called a limiting expression for M,, ,,,(2). 

From (89) it follows for the distribution of the excess 

re) m—-lyi 
| M,,, p(x) dx = 1—{exp[—me—]}} ¥ -~e-¥. (90) 
x j=0 J: 

Remarks: 1. In case f(x) = de-**,d,c>0, (89) is an exact formula, but for other initial 
distributions this formula is an approximation. Then often it is necessary to take n and 
thus w very large, in order to obtain a favourable approximation for a sufficiently large 
interval | «—w|. 

As an example we mention the distribution of the extreme values (m = 1) taken from a 
normal distribution. It converges very slowly toward the asymptotic distribution as sample 


30 Biom. 45 
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size increases. This also appears from the computed cases of Fig. 3. Even when n = 3!° the 
deviations are too large in comparison with the other approximate formulae. 

This also appears from an examination of the function s(x) (see (87)). For the case of the 
normal distribution and m = 1 it follows from (38a) (84) and (85), that 3,(~) = (w—w)/# and 


x 

dt ex") 
i+ Y1(%) mel us e—u(a—u) * 
Applying the well-known asymptotic expansion (14), it appears after some investigation, 
which at this place we leave out of consideration, that only for very large values of w, 
a sufficiently large interval | 2 —u | exist, for which y,(x) and 3, (x) are sufficiently small. Then 
the values of (x) and y(w) may be neglected with respect to y,(7) and 3,(x). However, the 
convergence of the asymptotic distribution (89) can be improved by an adequate choice of 
the parameters (see de Finetti (1932) and Gumbel (1954)). 


2. We may deduce (89) from (80) by determining the limit for k > 0. 


3. Concerning the case when k = 0 and the upper bound 0 is finite, we may be brief. Ac- 
cording to the treatment in § 2-4-3 and § 2-4-4, the formula (89) may also be applied in this 
case; provided for |x—u|<dé the deviation between (x) and unity is sufficiently small 
(see (87)). E.g. if f(x) = e-/©-, 0<a<b and w is sufficiently large we may unite 


ink eee A ee <i ee 
Mami) = Gy — Jy! op? | (buy SP aul 
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THE SAMPLING VARIANCE OF CORRELATION COEFFICIENTS 
““NDER ASSUMPTIONS OF FIXED AND MIXED VARIATES* 


By JOHN W. HOOPER 


1. INTRODUCTION 


This paper is concerned with the asymptotic variances of canonical correlation coefficients 
under alternative assumptions about the stochastic nature of the variables. The results 
obtained also apply to the cases of zero-order and multiple correlation coefficients, since 
they are special cases of canonical correlations. The model and the assumptions are presented 
in §2, the results in §3, and some interpretations of the results in §4. In the Appendix 
detailed proofs of the results are given. 


2. THE MODEL AND ASSUMPTIONS 


We assume that there are M variables y,, ...,y;, and A variables 2,, ...,7,, where 7' vectors 
of observations on these variables are available: 


(Yue Yaty---%r} (t= 1,...,7). (2-1) 


We shall consider the canonical correlations between the y’s on the one hand and the 2’s 
on the other. The assumption which underlies classical correlation theory is the following: 

(i) The T vectors (2-1) are independent random drawings from a (M + A)-dimensional 
normal parent with zero means. 

An alternative approach can be based on a system of stochastic linear relations between 
each of the y’s and all of the 2’s: 

A 
Y ut = Map tat Mu (a = 1,..., M3 ¢ = 1,..., 7), (2:2) 
where w,, is a disturbance and 7, a coefficient which is independent of t. The system (2-2), 
for ~ = 1,..., M, can be regarded as the reduced form of an econometric equation system, 
where the y’s are the jointly dependent and the x’s the predetermined variables. Consider 
then: 

(ii) The AT values x,, are all non-stochastic real numbers. The T disturbance vectors 
{Wy ... Wyn} are independent random drawings from a M-dimensional normal parent with zero 
means, and the y4 (for w= 1,...,M andt=1,...,T) are determined by (2-2), the 7, being 
parameters independent of ?. 

This is the case in which the x-variates are ‘fixed variates’. We proceed to consider a more 
general situation, viz., that of ‘mixed variates’, which covers assumptions (i) and (ii) as 


special cases. Assume that 
ay = Ey t Way (2-3) 


* The author wishes to acknowledge his indebtedness to Prof. H. Theil for his helpful suggestions 
as regards the contents of this paper. Any errors which remain are the sole responsibility of the author. 
This study was done under the terms of a Fulbright grant while the author was a member of the 
Econometric Institute, Netherlands School of Economics. 
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where w,, is stochastic but £,, non-stochastic and the w’s are independent. More specifically 
we assume: 

(iii) For A= 1,...,A and t= 1,...,7, x, is given by (2-3), where &), is a non-stochastic 
real number and w,, stochastic in such a way that the T vectors {wy ...w,,} are independent 
random drawings from a A-dimensional normal parent with zero means. These vectors are 
independent of T' disturbance vectors {uy ...U,y}, which are themselves independent random 
drawings from a M-dimensional normal parent with zero means. The y are given by (2-2), 
the 77, being parameters independent of t. 

It is easily seen that (iii) contains (i) and (ii) as limiting cases: if w,,=0 for all pairs (A, t), 
then the x-variates are fixed, so we are in case (ii); if ,, = 0 for all pairs (A, t), then the a’s 
and y’s, for each t, are subject to a joint multinormal distribution, as in case (i). 


3. THE ASYMPTOTIC SAMPLING VARIANCES 


In order to derive the variances and covariances of the canonical correlations under the 
assumption of mixed variates we write 


7 T T 
= UT 4 = Fy’s 2 ad wt = Sup’ ZY ut = On yu (3:1) 


We then have under the assumptions of canonical correlation theory 
EY Fyyvhyhy = LD 8 yk ky = | DX Cay hah, =T7, (3-2) 
A,A’ Ht, pe A, 


where h, (A = 1,..., A) and k, (uw = 1,..., M) are the coefficients that are to be applied in 
order to transform the x’s and the y’s, respectively, to canonical variates, and r is a canonical 
correlation.* Taking differentials,+ we find 


2 YE Aayhadhy + YAjhydoyy = 0, 
XW’ AA’ 


2D 8 yk Mhy +X ky ky spy = 9, (3:3) 
Py iil By 


we nye 


>, Cy, dk, +B Cayk dh, +B hy kydeny => dr. 


A, fe 


We may now assume without loss of generality that all variables are in canonical form which 
means that all h’s and k’s vanish except for one pair, h, and k, say, which are both equal 
to 1. Then (taking account that o,, = s,, = 1, ¢,, = 7) we can simplify as follows: 


2dh,+do,=0; 2dk,+ds,=0; r,dk,+1r,dh,+dce, = dr, (3-4) 
where o, stands for ¢,,, 8, for s,,, and c, for c,,. From (3-4) we find 
dr, = de, — 47,(ds, +do,) (3-5) 
and likewise for any other canonical correlation, say 79, 
dr, = de, — 4r,(ds,+do,). (3-6) 
* For a more detailed account of canonical correlation theory see Hotelling (1936) and Kendall 


(1955, pp. 348-58). 
t In order to do this we disregard, as Hotelling did, any multiple or zero roots in the population. 
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Squaring (3-5) and taking expected values (omitting the subscripts) we obtain to the order 
of 7-1 


JoHN W. Hooper 


varr = varc+}p*[vars + 2 cov (s, 0) + var 7] —pl[cov (c, s) + ev (c, o)], (3-7) 
where p is the parent canonical correlation and the other variables are defined as* 


[7 T T 
c= TaM= BEtw)ys s= Lys 
= =1 t=1 (3-8) 


We notice that o contains a non-stochastic variable £ and a stochastic variable w. We can 
define their respective variances as (HZ being the expected value operator) 


p = E(Xg); 1—p = E(Xu7) (3-9) 


since the expected sum of squares is 1. An evaluation of the terms in (3-7) then gives, 


1 2 ; 
vare = -{l+p%(1—2p*)}; vars = 7,(1—p*p*); 
2 2 
vara = 7 (1—p*); cov (8,0) = 7p%(1—p*); (3-10) 
2 2 
cov (¢, 8) = 7 p(1—p*p*); cov (c,o) = 7p(1—p*). 





When these results are substituted in (3-7) we obtain:} 


THEOREM. Under assumption (iii) the asymptotic sampling variance of a sample canonical 
correlation is 1 
varr = oA {(1—p?)? (2—p*p?)}. (3-11) 


Under the same assumption the asymptotic sampling covariance of any pair of canonical 
correlations is cov, 1%) = ©. (3-12) 
The second part of the theorem is proved by multiplying (3-5) and (3-6) and taking 
expected values. We obtain: 
COV (71,72) = COV (C1, Cy) — 4p COV (C,, 8g) + COV (C1, T,)] — $P,[ COV (Cy, 8,) + COV (Cy, 7;)] 


+ fP1P2[ COV (81, 82) + COV (81, T) + COV (Fy, 82) + COV (Fj, 2). (3-13) 


When this expression is evaluated we obtain (3-12) to the order of 7'~'. 


4, INTERPRETATION OF THE RESULTS 


In addition to the general results given in (3-11) and (3-12) there are several special cases 

to consider. When the number of dependent variables is one (M = 1) and when the number 

of independent variables is also one (A = 1) the canonical correlation becomes the zero- 
* Since all variables are in canonical form the y, and 2, in (3-8) are linear combinations of the y’s 

and w’s defined in (2-1) and (2:3). 

+ The proof of this theorem is given in the Appendix. 
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order correlation. When A> M = 1 the canonical correlation is the multiple correlation 
coefficient. Considering these two cases for p = 0 we find,* 











l Ci 1 
varr = —(1—p*)*?; var R? =, R(1— R?)?, (41) } 
4 T 
p 
where R? is the squared multiple correlation coefficient and R? is the corresponding para- a Hl 
meter in the population. These are the usual asymptotic results for correlation theory and 0-0 
correspond to the case under assumption (i).f When p = 1 we find 
OL 
varr = + (1—p%)*(2—p!); ) | 
i i - 3 | 
9 (4-2) | 
var R? = pel — R*)? (2— R?). | 5 
; 06 
These results correspond to the case of assumption (ii) where the independent variables 7 
are non-stochastic.{ The variances in (4-2) will always be less than the corresponding | ; 
variances in (4-1) except in the limiting cases of p = R = 0 and p = R = 1. Intuitively this 1-0 
seems reasonable since that part of the sampling variation due to the sampling variation 
in the independent variables is eliminated when they are non-stochastic. 
The results for the zero-order and multiple correlation coefficients for 0 < p< 1 are given size 
by 1 , isa 
varr = > (1—p?)?(2—p%p?); var R? = pel — R?)? (2— R®p?). (4:3) tha 
ass) 
From (4-3) it is seen that the sampling variances are continuous functions of p, p, and T. obt 
In Tables 1 and 2 various values for the variances of r and R? are given for a constant sample I 
| coe 
Table 1. Values of T varr = 4(1—p?)?(2—p?p?) for various values of p? and p* is i 
cee Ss ee lan cipal of 1 
|\ \p val 
| \ o | O1 0-2 0-3 0-4 0:5 0-6 0-7 08 | O9 MW the 
hate dae ‘ pies Ps 
| 0-0 1-0000 | 0-81000 | 0-64000  0-49000 | 0-36000 | 0-25000 0-16000 | 0-09000 | 0-04000 ; 0-01000 | Ho 
| 
0-1 1-:0000 | 0-80595 0-63360 | 0-48265 | 0-35280 0-24375 0-15520 | 0-08685 0-03840 0-00955 | Ho 
2 1-0000 -80190 -62720 -47530 +34560 *23750 -15040 ‘08370 -03680 ‘00910 | Ke 
| 3 1-0000 ‘79785 “62080 -46795 -33840 *23125 -14560 -08055 ‘03520 “00865 | Kr 
| <4 1-0000 -79380 -61440 -46060 -33120 22500 -14080 -07740 -03360 “00820! 
| 5) 1-0000 “78975 -60800 -45325 -32400 21875 -13600 -07425 ‘03200 -00775 |! 
| 0-6  1-0000  0-78570 | 0-60160  0-44590 | 0-31680 021250 | 0-13120  0-07110  0-03040  0-00730. 
‘7 10000-78165 | -59520 --43855 | -30960 | -20625 | -12640 -06795 -02880 -00685 | 
8 1-0000 77760 -58880 -43120 -30240 -20000 -12160 “06480 -02720 “00640 | 
| ‘9 1-0000 *77355 -58240 *42385 +29520 *19375 -11680 -06165 -02560 -00595 | ! 
| 1-0 1-0000 *76950 -57600 -41650 -28800 *18750 -11200 -05850 ‘02400 ; -00550 | 
| | | 








* We here make use of the fact—for var R*—that var r?= 4p? var r. 
t Cf. Kendall (1952, pp. 336, 385). 
{ Cf. Hooper (1958). 
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Table 2. Values of T var R? = 2R2(1 — R®)? (2— Rp?) for various values of R® and p* 























ire ee: | | ) | 
Re | | | 
0 0-1 02 | O38 | 0-4 0-5 0-6 07 | 08 0-9 1-0 
p | 
Sa : = be tes = SD teas 9d ee a. Se ees 
| | | | 
00 © | 032400  0-51200 | 0-58800 | 0-57600 | 0-50000 | 0-38400 | 0-25200 | 0-12800 | 003600 | 0 
01 © | 032238 | 0-50688 | 0-57918 | 0-56448 | 0-48750 | 0-37248 | 0-24318 | 0-12288 | 0-03438 | 0 
2 © | -32076 -50176 | -57036 | -55296 | -47500 -36096 | -23436 | -11776 | -03276| 0 
3 | O | -B1914 | -49664 | -56154 | -54144| -46250  -34944 | -22554  -11264 | -03114/ 0 
‘4 | 0 | -31752) -49152 | -55272  -52992 | -45000 | -33792 | -21672 | -10752 | -02952| 0 
5 0 | +31590 | -48640  -54390  -51840 | -43750  -32640 | *20790 | -10240 | -02790 | 0 
| | | | | 
06 | 0 | 31428 | 0-48128 | 0-53508 0-50688 | 0-42500 | 0-31488 | 0-19908 | 0-09728 | 0-02628 | 0 
71 0 -31266 | -47616 | -52626 | 49536 | -41250 | -30336 | -19026 | -09216 | -02466 | 0 
8 | 0 ‘31104-47104 | -51744 | -48384  -40000  -29184 | 18144-08704 | -02304 0 
9 Oo 30942 | -46592 | -50862 | -47232 | -38750 | -28032 | -17262| -08192 | -02142| 0 
10 | 0 -30780 | -46080 | -49980 | -46080 | -37500 | -26880 | -16380 -07680| -01980| 0 








size 7’. It may be noted that the relative change in var r as p varies for a constant p? and 7, 
is an increasing function of p?. Thus for p? > 0-9 the relative change for 0< p< 1 is greater 
than 50 %. This indicates that the correct specification as to whether the model conforms to 
assumption (i) or (ii) may be important in applications where such large values of p? are 
obtained. 

Finally, it appears that the asymptotic variance of the zero-order (M = A = 1) regression 
coefficient is independent of p.* This agrees with the well-known fact that the variance 
is identical for the two limiting cases of p = 0 and p = 1. Thus, we see that while the choice 
of models corresponding to assumptions (i), (ii), or (iii) makes no difference to the asymptotic 
variance of the zero-order regression coefficient it may lead to a considerable difference in 
the variance of the zero-order correlation coefficient. 
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APPENDIX 
We first prove the results given in (3-10). We write for our model in canonical form 


Ye = P(E +) + UH (A. 1) 
Substituting (A. 1) in (3-8) we obtain 


c= % [(E¢+ &y) (PE, + pw, + U%)]; 
c= Z [er Et om)? + us + 2plEs+ wr) (u)]; (A. 2) 
c= (et +ut + 2£,u;]. 

t 


Taking expected values and remembering that all terms containing w, or u, linearly vanish since 


E(wu,) = E(w,) = E(u) = 0, 
we find* 
E(c) = pLéit+p(l—p); Els) = p*2éi+(1—p%p); H(o) = Dé +(1—p). (A. 3) 
Subtracting the expressions in (A. 3) from the corresponding ones in (A. 2) we find 
de = 2pXE,w, + U[E,u,+ pu; + wu] —p(1—p); ) 
ds = 22[pE,w, + p&m + pw,u] + U[pw + uz]—(1 “| (A. 4) 
do = 22£,w,+ wi —(1—p). 
Squaring and taking the expected value of the first expression in (A. 4) we obtain 
varc = E[4p*(DE,w,)* + (Leu, + pLw; + Lw,u,)? + p2(1 —p)*] 
+} El4pdé,w,X(E,u,+ put +w,u,) — 4p?(1 — p) DE,w,] — E[2p(1 — p) L(E,u,+ put +w,u)]. (A.5) 
If we now omit all zero terms (a term is zero when it contains w, or u, linearly) we find 
vare = Ap? BB (EsweEr wr) +B Bbmberr) 
+p? > E(w? w2.) + X E(w,u,wy uy) — 29% 1 — p) DE(w?)+p%{1—p)?. (A. 6) 
t,t’ t,t’ 
The evaluation of all terms except the third is straightforward. For the third term we have 
per(T—1)(1—p)* , 3p*P(1—p)* _ TIT +2) 


rT 7 .. 


PPX E(w; w,) = p? X E (wp wt) +p? X E(wt) = 
t,t’ t+t’ t 


making use of the property that the fourth moment of a normal variate is three times its squared vari- 
ance. So we obtain 


T(T + 2) p? 
(P4200 py 


4 1 
var ¢ = 7,P'p(1—p) + 7,P(1 —p*) + 
+ (1 —p)(1—p*)— 2p%(1—p)? + p%(1—p?) = 501+ p%1—2p4)). (A.7) 
Squaring the second expression in (A. 4) and taking expected values (omitting zero terms) t we find 
vars = E{4p*[p*(DE,w,)* + (LE,u)* + (Lew,w)*] + 2p*Lwf, Dut } 
+ E{ U[ptw; wi, + ug ut] — 2(1 — p®p) E[p*wt + ut} + (1 —p*p) = = (1 —p'p?). (A.8) 


Squaring the third expression in (A. 4) and taking expected values (omitting zero terms) we have 
vara = E{4(Lé,w,)* + (Dwi)? — 2(1— p) Lw7}+(1—p)? 


= F(1~p'). (A. 9) 


* Henceforth, the order of summation will be over ¢ = 1,..., 7' unless otherwise specified. 
+ A term may also vanish because it contains w? or u? which lead to the third moments of normal 
variates. 
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Multiplying the first and second expressions of (A. 4) and taking expected values (omitting zero terms) 
we obtain 
cov (c,s) = E[4p%(XE,w,)? + 2p(LE,u,)* +p Zw)? + 2p(Lw,u,)? + piv} Dut] 
— E{p(1 — p*p) Ew} — p(1 — p) Lwi — p(1 — p) Dut] + p(1 —p) (1 —p?p) 
= = pl —p*p?. (A. 10) 


Multiplying the first and third expressions of (A. 4) and taking expected values (omitting zero terms) 


ye obtai 
peat cov (c, 0) = El 4p(ZE,w,)? + p(Zw?)?— 2p( 1 — p) Se? + p(1—p)?] 


2 
= pP(l —p?). (A. 11) 


Multiplying the second and third expressions of (A. 4) and taking expected values (omitting zero terms) 
we find 


cov (8,0) = E[4p?(XE,w,)? + p2(Lw?)? + Swe Dez] 
— E{p*(1 — p) Zw — (1 — p) Duy — (1 —p*p) Zw7] + (1 —p*p) (1—p) 
2 
= — 9%] — y?2), F 
mP (1—p’) (A. 12) 


This proves the results given in (3-10) and hence the first part of the theorem. 

In order to derive the covariance between any two canonical correlations, say 7, and r,, we can 
regard the expressions in (A. 4) as applying to 7, by attaching the subscript | to the variables &,, u,, w,, 
p, and p. Similarly, we have for ry: 


\ 


eg = Bg UE gy Wey + D[Eap Uae + Pg Wee + Woy Uae] — Po(1 — po): | 
d8q = 2Z[P2 bo4 Wee t+ Po Sor Var + Pz Woe Uae] + D[p2 we, + U,] — (1 — pepo); | (A. 13) 
dog = 2DE Wy, + Lewy, — (1 — pp). 
Multiplying the three expressions in (A. 4) by each of the three expressions in (A. 13) and taking expected 
values we obtain the nine covariance terms of (3-13). Applying this procedure to dc, and de, we have 
(omitting zero terms) 
COV (C1, C2) = Ep, p, Dwi, Dee — p p3(1 — 2) Dewi,— py Pa( 1 — p) Dw] + p1p2(1 — py) (1p) = 0. (A. 14) 


In a similar manner, as the reader may easily verify, all other covariance terms in (3-13) are zero, which 
proves the second part of the theorem. 
For the asymptotic sampling variance of the zero-order regression coefficient b we have 


var b _ vare 4 varo 2 cov (c, @) (A. 15) 
b  {B(c)}? {E(a)}* {E(c)} {E(o)}" 

Upon substituting from (A. 3), (A.7), (A.9) and (A. 11) we obtain the familiar expression (f being the 
opulation parameter 

pop Pp ) l (=p?) 


var b =7P pe 


all terms containing p having vanished. This proves the remark made at the end of § 4. 








(A. 16) 
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THE MEAN DEVIATION, WITH SPECIAL REFERENCE TO SAMPLES 


FROM A PEARSON TYPE III POPULATION 


By N. L. JOHNSON 
University College London 


1. Let 2,,%5,...,%, be independent random variables, each following the Type III 


distribution 1 
p(x) = Tia) x-te* (z>0) 
which has moments and moment ratios 
G(x) = a = var (2); 
Pyle) = 4a; Bola) = 34 6a, 
The joint probability density function of 7, 75, ..., 2 


2 18 





(n a—1 2 
‘ (11 x,) eB (x;>0). 
oa / 


P(%y, Xo, ---5> Xn) = ave 





We now make the transformation 


n 


u, = 
a 


~— 
| 
| 
8 
w 
| 
— 
3 


n . 
Wy = Say (ij 


which has the Jacobian 


ll 

bo 
. 
we 

= 
— 
. 


O(Uy, Ue. wa) ( n ip 
O(a:, 2o « rk, n ; 


We find 


{(n— nyn la 1 n a—-1 n i in 
P(Uy, Us, ..-, Un) = : - m+. x 4; I] uj—t eyas"i4— | YU; > — NU). 
2 j=2 


(T'(a))” 


j=2 
Hence, for wu, >0 





= Im Un—1) a ica) a) n a—1 a—l 
plu) = {(n- Tea - es “ (u + >> u,) ( ll w,) € — Eu duy... dup. 
\ d 


#0 
Applying Dirichlet’s formula, we have 


> ee {(n —1)/njo- _— —u ra 4 ie n—1)a—1 p—t 
p(u,) = l(a) T((n—1)a ye ‘| (m+ 5) tt edt, 


L 
where L=0 if u,>0, 
L=-nu, if u,<09. 


Hence if wu, > 0, and provided & is an integer 


yh SS 


T(a)P((n—l)a)° |= 


f(n—1)/n}"—-Ve a-—1 ‘a—-] 
( / j 
P(u) = 

= 


rT. ee e—] — a. 
, )n C((n—1l)a+r) ug 


ust. 


i.e. play) = (So) eng alee [(n—lo+r—1] 


x 


n r=0 nr! (a—1 —r)! 


(1) 


(2) 


(3) 
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ae : i® 
The mean deviation of the n values 2,,%,...,%, ism =— ¥|x;—%| and the expected 














value of m is an 
: o n—1 
E(m) = &(|a,—-%|) = = &(| u,|). 
Since E(u,) = 9, 
E&(m) = an Y) 8 (uy, | uw, > 0) Pr {u, > 0} 
2(n — 1) [” 
ocak U, p(U;) du, 
=2 ("5 ee ee ee gee 
r=0 nN .?! 





a (lilt a—1(F(y J ee 
(“ ") a ! +> ‘oe )a+1].. [n- ate] 
n r=] nt .r! 


sear Be ie el 
Te 


_— og {a—l\e-9et"  (na—a+ 1)(na—a+2)...(ma—1) 
oe eu ) A = mae (4) 
for «> 2, while for « = 1 ‘ in —1\" : 
6(m) =: ex, (4) 

For large n, expansion in powers of n-! gives 

i eo 2 Fi 
6m) =F (1-5 ©) 
: - m 2a% e-% 3 
and Se on “ait (6) 


By analogy with the Normal case, and also considering the general relation 





ais 
var (%;—2) = 


var (x; — &(z)), 
the alternative approximation 
; 2ore-* n—i ae 
sme, /( n (5) 
is suggested. 


Table 1, below, shows some typical values for the ratio of expected values of mean 
deviation to the standard deviation (,/«) of the distribution. 

Table 2 gives some approximate values of this ratio, calculated from (5)’, which is a 
considerably better approximation formula than (5). 


2. If m is to be used as the basis for an estimator of the population standard deviation, 
7, then an unbiased estimator would be m(a/&(m)). 

In the case considered in §1 the exact value of the multiplying factor depends on the 
parameter a (as well as on the sample size, ). It is, in fact, the reciprocal of the function 
shown in Table 1. For values of n greater than, say, 15, however, it appears that the likely 





















































480 Mean deviation 
inaccuracy in a will not be of so large an amount as to produce a really serious error in the A 
value used for a/&(m). I 
Table 1. Ratio &(m)/o for samples from Type III populations 
on 1* 2 3 4 = 7 | # 
wh | | | 
Leal e aie ; eS eee. E 
3 | 05926 | 0-6208 0-6307 0-6359 | 0-6389 0-6410 06425 | 0-6436 0-6515 | wh 
4 -6328 = -6607 | -6706 -6756 -6786 ‘6807 | -6821 -6832 -6910 | 
5 -6554 -6833 | +6932 “6982 | -7013 | -7033 -7048 -7059 -7136 | 
6 | +6698 -6979 ‘7078 | +7129 | -7160 -7180 -7195 +7206 *7284 
8 | -6872 ‘7165 | -7256 -7307 ‘7338 | -7359 ‘7374 +7385 7464 | 
10 | 06974 | 0-7260 | 0-7360 0-7412 0-7443 0-7464 0-7479 0-7490 0-7569 
12 ‘7040 | -7328 -7429 -7481 -7513 -7534 -7549 -7560 -7639 | 
15 ‘7105 | +7395 | -7497 ‘7549 | -7580 -7602 ‘7617 | = +7628 ‘7708 | 
| 20 ‘7170 | +7461 -7564 -7616 -7648 -7669 -7685 -7696 W717 | 
30 | +7233 “7527 -7630 -7683 ‘7715 | +7736 -7752 -7763 | 7845 | 
| | | 
ee) 0-7358 | 0-7656 | 0-7761 0-7815 0-7847 0-7869 0:7884  0-7896 | 0-7979 | 
2 ee eee | 7: 
* When «=1 the population is exponential, when a= 00 it is Normal. ) 
3 2a2-t e-# n—1 - 
Table 2. Values of the approximate formula (a—1)! a for the ratio of Table | 
a—1)! 
x 1* 2 3 4 5 6 We Pie oo* | 1 
val 
n 
the 
gp; I } i : val 
3 | 06008 | 0-6251 0-6337 0-6381 0-6407 0-6425 0-6437 0-6447 0-6515 bes 
6 -6717 -6989 -7085 ‘7134 -7163 -7184 ‘7197 | -7208 -7284 to: 
10 | -6981 -7263 -7363 ‘7414 -7444 -7465 -7480 -7491 -7569 | 
15 | 0-7109 0-7396 0-7498 0-7550 0-7581 0-7602 | 0-7617 0-7628 0-7708 | 
20 *7172 -7462 -7565 -7617 -7648 -7670 -7685 -7696 “T7177 
30 *7234 | -7527 ‘7631 -7684 ‘7715 -7737 -7752 -7763 *7845 Val 
* When «= 1 the population is exponential, when «= 00 it is Normal. 
Turning now to the general question of estimating o from a sample value of m, itis | 
useful to compare the standard deviations of m(a/&(m)) and s, the sample standard devia- ' 
tion. We will assume that the correct multiplier o/&(m) is used, and neglect the small bias , 
sta 


in s, regarded as an estimator of o. If the sample size, n, is large, then 
oa? 6 
var (8) +7 (b2— 1), (7) 


where /, is the second moment-ratio of the distribution of the observed variable x. 


he 


Oe aS a lL 


Se) i ad 
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A large sample approximation to the variance of m may be obtained as follows. 
If n is large, 7+ &(x) and 


&(m) = (1/n) n&(| w—&(x) |) = &(| wx—E(z) |), 
E(m?) = (1/n?) [n&(| w— &(x) |?) + n(n — 1) {E(| x-— E(x) |)}*] 
= (1/n) var (w) + (: -*) [é(m)}*. 
Hence var (m) = &(m?) —[&(m)]? = (1/n) [var (x) —[&(m)}*], (8) 
where &(m) now stands for the population (n = 00) value of the mean deviation. 


Table 3. Comparison of exact and approximate formulae for var (s) and var (m) 
in samples from a unit Normal population 











var (8s) var (m) 

Pas eae a ee Bolla eatin 

"i Approx. (8) 

Exact | Approx. (7) Exact 1 -) Approx. (9) 

n ( 1 | 

o  % | 
5 0-09314 0-10000 0-07094 0-07268 0-07093 
10 04854 -05000 -03589 03634 03590 
20 -02466 -02500 -01806 -01817 “01806 
30 -01652 -01667 -01206 ‘01211 | -01206 








To give some idea of the accuracy of these approximations, Table 3 shows (i) the exact 
values for var (m) in samples from a unit Normal population (Pearson & Hartley, 1954) and 
the values given by the approximate formula (8), and (ii) a similar comparison for the exact 
value of var (s) and the value given by approximation (7). Both approximations appear to 
be sufficiently accurate for our purposes, at any rate for Normal populations. It is interesting 
to note that a very good approximation to var (m) is given by the simply modified formula 


var (m)=5(1-—) (1-"). (9) 


Values given by this formula are also shown in Table 3. Using (7) and (8), we have 
,/n x (coefficient of variation of s) =4,/(f,—1), 


, en é o 
/n x (coefficient of variation of m) = J! (eer -- 1) , 
Hence, so far as the approximate comparison is valid, m(a/&(m)) will have a smaller 
standard deviation than s if 
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The table below gives limiting values of /, for a few commonly occurring values of &(m)/c. Fo 
matic 
é a ot 
ie [em 

eaiaeats ee ss wher 
0-70 5-16 +6 
“75 4-11 is abc 

80 3-25 | 

“85 2-54 | 











Hence, even for a moderate amount of leptokurtosis the sample mean deviation may 
provide the unbiased estimator (of population standard deviation) with the smaller standard 
deviation. It must, however, be remembered that it is necessary to multiply the sample 
mean deviation, m, by the factor o/&(m) to provide an unbiased estimator of 7, and the form 
of distribution may not be known sufficiently precisely to give an exact value for this factor. 
On the other hand, the value of the factor does not change very rapidly with change in form 
of distribution. 

It may be noted that an approximation of the form used in Table 2, namely 





Expected value of mean deviation in sample of size n _ L —1 


Population value of mean deviation n 


would enable an approximately unbiased estimator of population standard deviation, based 
on the sample mean deviation, to be obtained. The approximation of equation (8) might 
then be used to estimate the standard error of this estimate. 

For example, taking the case of the Type VII distribution dealt with in the following 
section and supposing we had a sample of n = 5 from a population with a = 4, the population 
standard deviation would be estimated by 





m x (0°759 ./#)-1! = 1-473m. 
The standard error of this estimate would be estimated as 


1-473 x 2473/ 
v5 | 
3. Application of (10) to the Type III distribution gives results shown in Table 4. 


Tables 5 and 6, respectively, give the results of similar calculations for the Type VII 
distribution 


4 
]— (0-759) = 0-623m. 


and for the Type II distribution 


1 
p(x) = ——(a(l—2z))* (O<ax<l). 
Pe) = paxriaxn mar ) 
From the above three tables, it can be seen that m(a/&(m)) has, to the degree of approxi- 
mation used, asmaller standard deviation than s for symmetrical Pearson curves with £, > 3-5 | 


and for Type III curves with £,> 3-35. These values of £, correspond to only moderate 
leptokurtosis. Pear 
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For finite values of n (the sample size) the criterion needs modification. A first approxi- 
mation to the modified form of criterion replaces (10) by 
40° * 1 
[simp °<P2— 5 (Aa 9)- 
where &(m) is still the population (n = 00) value of the mean deviation. A glance at Tables 
4-6 shows that this modification will not seriously affect our conclusions if the sample size 


is about fifteen or more. 
Table 4. Type III 

















l in oy 
; é(m)/o x* B a | &mjo | x* | 2B, 
| 
y | | | 
1 1 0-736 4-4 9 To) “eRe 3:44 | 3-86 
. 2 0-766 3-8 6 8 0-790 | 342 | 3-75 
3 0-776 3-6 5 12 0-792 | 3-37 3-5 
a | 4 0-782 3-55 45 16 | 0-794 3-35 3-375 
5 0-785 3-50 4-2 20 | 0-795 334 | 33 
. | 6 0-787 3-46 4 co | 0798 | 328 | 30 
pbs eee wane je Fee ES, oA eee 
2a%e-& 
o=a; f,=3+6a"; &(m)= ais < 
Table 5. Type VII 
BES PPBEE OPTRA TA Aw ee ae 
fl | a é(m)/o xX* | Be ot E(m)jo | X* | , 
| | | | 
t | - | | | 
| | | 
Sty 0-637 | 68 $| 6 | 0-776 364 | 386 | 
g | 3 | 0-735 4-4 9 8 | 0-783 | 353 3-55 | 
- 4 0-759 3-9 5 10 0-786 3-47 3-4 | 
5 | 0-770 3°75 4:2 eo 0-798 3-28 3-0 
Spee Ee Pee y ee ee Yee gem ins ore 
o? = (2a—3)-1; £,=3(2a—3) (2a—5)-1; &(m)= ; Fs 
ie : (a—1) Bis, «— 4) 
Table 6. Type II 
x | &(m)/o | xX* | Be | 
1. rvewlare wy on 
T 0 (rectangular) | 0-866 2-33 1-80 
l 0-839 | 2-69 2-14 
2 | 0-827 2-85 2-33 
3 | 0-820 2-95 2-45 | 
} | co (Normal) 0-798 3-28 3-00 | 
2 } 
2®—1(2 3); f= 3(2 3) (2 5)-1; 6 = ———— 4 —2x) (a(1—2x))*dx.t 
o*? = $(2a+3) Pe (2a +3) (20+ 5) (m) arrears Ke ) (a( ))*e 
; 402 
i- * X =—_—__-— 3, + For large samples. 
(&(m))? 
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AN APPROXIMATION TO THE DISTRIBUTION OF NON-CENTRAL ¢ 


By MAXINE MERRINGTON anp E. 8S. PEARSON 
University College London 


1. INTRODUCTION 

The publication by Resnikoff & Lieberman (1957) of very extensive tables of the non-central 
t-distribution, reviewed elsewhere in this issue, has been a welcome achievement. Apart 
from providing a quick solution to a number of specific problems described in the introduc- 
tion, the Tables make possible further investigation into methods of approximating to this 
interesting distribution. Two considerations suggested the inquiry described in the present 
paper: (a) Interpolation for the non-central parameter 4 is not altogether easy in the 
Resnikoff & Lieberman Tables and low values of d are not covered.* (b) A few isolated 
comparisons had shown that the non-central ¢-distribution could be closely represented by 
a Pearson Type IV curve. 

With regard to point (a), it should be stated that the authors of the Tables had primarily 
in mind their use in certain quality control problems, where the chosen values of d—ten for 
each value of the degrees of freedom f—were particularly appropriate. Nevertheless, as wili 
be seen when the diagrams printed below are discussed, a substantial class of distributions 
are not covered by the Tables. 

In the following pages we shall first explore the relationship between the moment ratios 
/,(t), £,(t) of the distribution and the parameters f and 6. The field of £,(¢), 2,(¢) will be found 
to be almost exactly that of a Pearson Type IV curve. We shall then examine the extent to 
which the tables of percentage points of standardized Pearson curves (Pearson & Hartley, 
1954, Table 42) can be used to give correct values for the percentage points of non-central t. 
While we have no doubt that alternative and perhaps more accurate methods could be 
found to supplement the Resnikoff & Lieberman Tables,+ we think that the present investi- 
gation illustrates the value of the suggestion recently made to us by Professor John Tukey: 
namely that we should extend the table of standardized percentage points for Pearson 
curves, both as regards the number of points tabled, the number of decimals given and the 
range of £,, 2, values included. 


2. THE DISTRIBUTION AND MOMENTS OF NON-CENTRAL ¢ 


Using the notation of Johnson & Welch (1939) and Resnikoff & Lieberman (1957), we shall 
write 
ioe (1) 
Jw 
where z is distributed normally about zero with unit standard deviation and w is a quantity 


distributed independently as x?/f with f degrees of freedom. The probability density dis- 
tribution of t may be expressed in the form 


{2\—3(f+)) 1 fe? —ts \t 
t| f,6) = constant x (1+5) exp | —= ~—,| Hh, |- , yt 2 
* For example, for f= 8 the lowest entry for é is 2-03 and for f = 20 it is 3-09. 
+ Recently Harley (1957) has shown how an approximation which is probably as accurate as our’s 
can be obtained by a transformation of the distribution of the product-moment correlation coefficient r. 
{t The ‘constant’ contains f but not 0. 
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© af 


where Hh,(x) = ofl 


exp —}(v+2)?dv (3) 
is the tabled Hh-function (Fisher, 1931). 
The Type IV curve can be written in the form 


‘ x —m 
nee |am,») = yo(14+Z5) - expl—vtan-* (e/a) (4) 


where —co<a<0oo. It would therefore be tempting to equate a to ./f, m to $(f+1) and to 
determine empirically a value for vy which would bring roughly into agreement the last two 
terms on the right-hand side of (2) and tle last term in (4). This might be done by seeking 
approximate agreement between some of the lower moments of the two distributions. Very 
little, would, however, be gained by this approach as no tables for the Type IV distribution 
exist which can be entered with the parameters a, m and v. One point of interest does 
however arise from the comparison. 

For the distribution of equation (4), m is a function of the moment ratios /,, /,, in fact 
m = (5f,— 6, —9)/(28,—3f,—6). Thus if we were to equate m to }(f+ 1), we should have 
a relation 


f-3, .M-2 

h2=5 74h > (5) 
This implies that for constant f, the empirical Type IV distributions would have /,, £, points 
lying on a series of straight lines in the £,, #, plane which all pass through the point £, = —4, 


f, = —3 and cut the f, axis at the correct points, £, = 3(f—2)/(f—4), for the central 
t-distribution for which 6 = 0, £, = 0. Examination of Fig. 1 shows that the contours of the 
true £,(t), £.(t) points for constant f are in fact not far from straight lines, although the slope 
of the best approximating lines is not quite that of equation (5). A better approximation is 
provided by equation (10) given below. 

Before attempting to relate the parameters f and dé with those of a particular type of 
approximating curve, it will be useful to examine the relation of the moment ratios /,(¢) and 
£,(t) to f and 6. 

The first four moments of t about zero are readily obtained from those of z (normal) and 
of the reciprocal of y, since 


E(t) = f¥E(z+ 0) x E(x), (6) 
where &(x-*) = 2-1 4) /r( £) for f>s. (7) 
Thus 
ae - " 
wi) = ANE BIRD 5, nye) = 148%), 
P(3f) f-2 (8) 
’ pF QS-2) 5 nf . 
Hil) = APE? 83+ 8%), MO = Gay pag Bt OR +9. 





Central moments and moment ratios for particular values of f and 6d can be 
calculated by first obtaining the numerical values of these moments about zero, 


31 Biom. 45 
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and then converting to moments about the mean or, perhaps more, expediently from 
the relations 


at ee ae) | ee ea : sie 
- i F( ¥- eee -" 
Hs Mp2 2) AF (9) 
. + 2. pty rye (FLU t+ DO +3(3f—-5))_ | 
wi aa OT” eee ee 





The value of &(y/,/f) is given to six decimal places in Biometrika Tables for Statisticians, 1, 
Table 35 for f = v = 1(1) 20(5) 50(10) 100, making the calculation of the mean straightforward 
for low degrees of freedom. Using these equations, values of £, = 3/43 and 2, = /4/”3 were 
calculated for every intersection of the f and 5 contours shown in Fig. 1. 

It will be seen at once how nearly linear the f-contours are. An empirical calculation 
suggests that within the area of this diagram, the contours for f are represented quite well by 
the system of straight lines 

fh, = 1-406 — f+ oad 2) ; (10) 
t foe 
in place of the system of equation (5), which was obtained by equating m of the Type IV 
curve (4) to (f+ 1). 

A second point to be noted is the way in which the é-contours crowd up towards the 

limiting curve for d = oo. In the limit (as 5 > 00) t/é is distributed as ,/f/yv, which has for its 


sth moment about zero 
i (vf) _ 
Ms ( z - (;) *)/r(6) rs (11) 


Although the limiting moments of ¢ are infinite, the beta-coefficients are finite and were 
determined from (11) for the values of f used in the diagram, leading to the limiting contour 
marked 6 = oo. This curve lies just below the curve 


By (2+ 3)? = 4(482—3f,) (22, —3f,—6), 


on which lie the /,, £, points for the Pearson Type V curve, which forms the upper boundary 
of the Type IV area. (See the chart of Table 43, Pearson & Hartley, 1954). Thus the beta 
points for the non-central t-distribution lie entirely within the Type IV area. It is of interest 
to note that it is the reciprocal of y?, not of y, which has a Type V distribution. Thus the 
limiting form of the Type IV approximation does not provide the correct distribution law 
as d > 00, although it does at the other boundary when 6-0 and Type IV turns into 
Type VII or Student’s distribution. 


3. THE TYPE IV APPROXIMATION TO THE PERCENTAGE POINTS OF ¢ 


There are no available tables of the probability integral of a Type IV distribution, but four 
upper and four lower percentage points for any distribution with /,< 1-0, £,<5-0 can be 
found by interpolation in Table 42 of Biometrika Tables for Statisticians, 1. We have made 
comparisons of upper and lower 5, 1 and 0-5 °% points for the 19 cases having the f and é 
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Fig. 1. Contours of constant f and 4 in /,, £, plane. 
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values specified in Table 1. True values were obtained either from Resnikoff & Lieberman’s 
table of percentage points or from Johnson & Welch’s (1939) tables. The approximating 
values were found as follows: 

(a) The first four moments of ¢ were calculated from equations (9). 

(b) From these, values of £,(¢), £,(¢) were obtained. 

(c) Linear interpolation in Table 42 then gave the standardized deviation for the six 
percentage points. 

(d) The approximate percentage points for ¢ were then calculated from the relation 


percentage point for t = ~}(t) + o(t) x standardized percentage point (12) 


In so far as the /,, 2, values for given f and 6 can be read from Fig. 1, it is only necessary to 
calculate the mean and standard deviation of ¢ from the first two equations of (9), a very 
quick process if the mean value of x/,/f is available in Table 35 of Biometrika Tables for 
Statisticians, 1. However, as this simplification sometimes introduces small last figure 
errors (the graph being difficult to read correctly between the contours), we carried out the 
full calculation for £,, 2, in every case. 

Two points should be noted about the accuracy of the comparison. In the first place the 
standardized deviates given in the Biometrika Table 42 are only given to two decimal 
places. After interpolation and multiplication by o(t) (for which the values are given in the 
4th column of Table 1), it is clear that an error of 1 unit in the 2nd decimal place of the 
approximate percentage point must often occur, due solely to the limited scope of Table 42 
and not to the inadequacy of the Type IV approximation. 

Secondly, Resnikoff & Lieberman’s tables of percentage points of t/,/f were calculated 
(see their Introduction, pp. 27-8) by six-point inverse Lagrangian interpolation in their 
4-decimal place tables of the probability integral. They record their percentage points 
to three decimal places and remark that these ‘are believed to be correct in the second 
decimal place throughout, and to differ occasionally from the true values by no more than 
one or two units in the third decimal’. It is therefore a little difficult to say what errors 
may be expected to result when these values of t/,/f are multiplied by /f to give percentage 
points of ¢. 

However, on comparing the results of our Type IV approximation with the ¢ values 
derived from Resnikoff & Lieberman’s percentage point tables or from Johnson & Welch’s 
tables, it was only for the large values of f of 34, 44 and 49 that we found any differences 
greater than 1 unit in the second decimal place. On going back to the probability integral 
table of Resnikoff & Lieberman for these f values and making fresh inverse interpolations 
we obtained rather different values for t/,/f at the lower percentage points. When these 
adjusted values were multiplied by ,/f all the differences of over 0-01 disappeared. 

The figures in the last six columns of Table 1 contain these adjustments for f= 34, 44 
and 49; otherwise they have all been derived from Johnson & Welch’s or Resnikoff & 
Lieberman’s tables. 

Thus as far as we can tell, within the range of f and 6 covered by this investigation, the 
Type IV approximation seems to provide values of both upper and lower 5, 1 and 0-5% 
points for non-central ¢ which are not in error by more than 0-01. It is almost certain 
that in some cases the approximation could be more accurate if tables of standardized 
deviates of Type IV distributions were available to three decimal places. 
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The points marked in Fig. 2 with double circles, correspond to distributions with the 
lowest 6 values (for given f) tabled by Resnikoff & Lieberman. Thus the Tables do not deal 
with non-central t-distributions represented by points in the area between the /, axis and 
the line joining these seven circles. While such distributions may not be of interest in 
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Fig. 2. Chart showing points at which the accuracy of the Type IV approximation was tested. The 
points surrounded with two circles correspond to lower limit 6 values in R. and L. Tables. 


situations most commonly met, it is clear that there is a wide class of distributions, with the 
central t-distribution at one boundary, which is not covered by the Resnikoff & Lieberman 
tables. It is in this region that the Type IV approximation is likely to be at its best. 


4. CONCLUSION 


The present investigation has shown that in the region of the upper and lower 5-0-5 % points 
the Pearson Type IV curve provides a very good approximation to the non-central t-distri- 
bution over a wide range of values for fand 6. In full, this approximation involves calculating 
the first four moments of ¢ from equations (9) and then using the table of standardized 
deviates (Biometrika Tables for Statisticians, 1, Table 42). For many purposes it would 
probably be adequate to calculate only j4(t) and a(t) and read off the /,, 2, values from the 
chart of Fig. 1. 
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We realize, of course, that it is more likely in practice that the probability integral of 
non-central ¢ rather than the percentage points will be required. We therefore regard our 
investigation as in part a contribution to a wider subject: the practical utility of tables 
which enable a single system of non-normal curves to be used in approximating to the 
percentage points of other untabled distributions, through the use of four moments. 

Itis planned to take steps to extend Table 42 to give 3-decimal accuracy in the standardized 
percentage points, further points in addition to the eight already tabled and a more extended 
range of £,, 6, values. 
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UPPER PERCENTAGE POINTS OF THE GENERALIZED BETA 
DISTRIBUTION. III 


By F. G. FOSTER 


Research Techniques Unit, London School of Economics 


1. INTRODUCTION 


These tables extend the tabulation of the 80, 85, 90, 95 and 99 % points of [,(k; p,q) to the 
case k = 4. They are a continuation of the tables for the cases k = 2,3 given in Foster & 
Rees (1957) and Foster (1957). These papers will be referred to as ‘I’ and ‘II’, and reference 
is made to them for definitions. 

The interpolation requirements are similar to those for II. Some uses of the tables are 
indicated in I and IT. 


2. METHOD OF COMPUTATION 


The computations were again carried out on the DEUCE Computer of the English Electric 
Company. Substantially the programme for II was used, with, of course, a modification 
for the computation of the function [,(4; p,q). Let Anox. denote the greatest root of 


| ¥.B—(v,A +v,B)| = 0, 


where A and B are independent estimates, based on v, and v, degrees of freedom, of a parent 
dispersion matrix of a four-dimensional multinormal distribution. Define 


L,(4; P; q) = Pr {Ouax, < x}, 


where p = 3(v,—3), g = 4(v,—3). Then by a similar method to that used in I and IT, and 
employing a formula of Roy (1958), we obtain 


q(2q + 1) 1,(4; p,q) = [,(2p + 4, 2q) L,(2; p,9) (p + 1) (2p + 3) (2p + 2¢+ 1) (p +941) 
— I,(2p + 3, 2q) L,(2; p,q) (p + 1)? (2p + 2q + 1) (2p + 2¢ +8) 
+1,(2p + 2, 2q) 1,(2; p+1,q) p(2p +1) (p+q+1) (2p + 2943) 
—1,(2p + 1, 2q) [,(2p + 3, 2q) p(p + 1) (2p + 2q + 1) (2p + 2¢ + 3) 
+ L,(2p + 3, 2q) [,(p, 9) a,(p + 1,9) (2p + 1) (p+ 1) (p +9) (2p + 29 +3) 
—1,(2p + 2, 2q) L,(p. 4) a2( p + 2,9) (2p + 1) (2p + 3) (p +9) (p+ q+) 
+1,(2p + 1, 2q) I,(p + 1,9) 4,(p + 2,9) p(2p + 3) (2p + 2g +1)(p+q4+)) 
+ 1(2; p,q) b,(2p + 2, 2¢) (p + 1)? (2p + 2¢ + 1) (2p + 2¢ + 3). 


The ranges of p and q were chosen as in IT, namely, p = $(4)4, ¢ = 1(1) 96. On this 
occasion a simplification of the programme was introduced by use of the same relations for 
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the integral and half-integral values of p in the computation of all of [,(p,q), a,(p,q) and 


b,(p,q). These relations were: 
I,(p, 9) = 0 (p > Po), 
TA po; 1) = xo, 
L(p+1,q+1) = 2l,(p,qt+1)+(1—2)L(p+1,q) (p>po, 97>), 


L,(po,9+ 1) = 1,( 99:9) + +7 el Po) —~L(py+1.o} (2): 
a,( Po; 0) = xPo, 

a,(p+1,0) = 2a,(p,0) (p>po), 

a,,( Po, 1) = (299+ 1) (1—2) ao, 

(p+q+l pte (p,q+1)+(1—2)a,(p+1,9)} (P>Po 49>), 

2p9+2q+1 
2q+1 


a,p+1,q+1)= [+5 


a,( 99,9 +1) = (1—x)a,(p,9) (q2>1); 


4 1 
be( Pos 1) =P “arot(1 —2) 


b] 


Po 
b(p+ lat PP TEt?) iy 1)+(1—2)b 1.q)} (p>p» 72>), 
( P qt+lj= 1+p(p+9q42)° (P,gt+1)+(l—x)b,(p+1,q)} (p>Po, 729) 
+q+1 
b.(Po,q+1) = Pett = (1-2) b,( p,q) (q> 0). 


In these relations p) was given the value either } or 1. (Inspection of the formula for 
[(4; p,q) shows that in fact both values of p, are sees for [,(p,q), but only py = 4 for 


a,(p,q) and only py = 1 for b,(p, q).) 
The percentage points were then obtained exactly as in IT. 


3. FURTHER WORK 


The existing programme could now be used with little modification for extensions of any 
of the Tables 1, 2 or 3 to higher values of p or g. It could also be used for computing per- 
centage points of the £-distribution, [,(p, q), itself. The method of computation is probably 
feasible for k = 5, but this would be near the limit of its usefulness, since provision for round- 
off errors makes the computation increasingly lengthy. Beyond this point, a new theoretical 
approach. is probably required. No further tabulations are presently contempiated. 


The author is again indebted to the staff of the London Computing Service for assistance 
and to the Director of Research of the English Electric Company for permission to use their 
DEUCE Computer. 
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0-9725 
-9799 
‘9869 
-9936 
-9987 

0-8902 
-9078 
-9272 
*9505 
‘9789 

0-8034 
+8264 
+8532 
*8882 
‘9381 

0-7259 
-7514 
-7820 
*8236 
“8885 

0-6594 
-6858 
-7180 
-7632 
+8374 

0-6028 
-6292 
-6619 
-7085 
*7882 

0-5544 

“5804 

-6128 

-6599 

-7424 

-5128 

‘5381 

-5700 

-6166 

-7003 

*4768 

-5013 

-5323 

‘5782 

-6619 

0-4453 
-4690 
-4991 
+5439 
-6270 
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~ 
— 
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0-9779 
-9839 
-9895 
-9949 
-9990 


0:9083 
9231 
-9394 
-9589 
-9825 

0-8311 
*8512 
*8744 
-9045 
-9473 

0:7596 
*7824 
*8095 
-8463 
-9032 

0-6965 
-7206 
*7497 
-7904 
+8567 

0-6417 

-6661 

-6961 

-7387 

-8110 

-5940 

-6183 

“6485 

-6920 

7677 

0-5524 

5763 

-6063 

-6499 

*7275 

-5160 

+5394 

-5688 

-6120 

-6903 

0-4839 
-5066 
+5353 
-5779 
*6561 


Oo 


So 


6 


0:9816 
-9865 
-9913 
*9957 
“9992 

0-9212 
-9340 
-9480 
-9648 
-9850 

0-8518 
*8695 
“8900 
-9166 
-9541 

0-7856 
-8062 
-8306 
-8636 
-9144 

0-7259 
*7479 
-7746 
-8116 
-8716 


0:6730 
-6957 
-7235 
-7628 
-8290 

0-6264 
-6492 
-6774 
-7180 
‘7881 

0-5853 
-6079 
-6361 
‘6771 
*7495 

0-5489 
-5712 
“5991 
-6400 
-7136 

0-5165 
-5383 
+5658 
-6064 
-6803 


0-9842 
“9885 
-9925 
-9963 
-9993 

0-9309 
*9422 
*9545 
-9692 
*9869 

0-8678 
*8838 
-9022 
*9259 
9593 

0-8063 
*8251 
+8473 
8773 
-9231 

0-7498 
‘7701 
*7947 
+8287 
-8836 

0-6989 
-7201 
-7460 
+7825 
*8436 

0-6536 
“6751 
-7016 
-7396 
-8049 

0-6131 
6346 
‘6614 
-7000 
-7680 

0-5770 
-5983 
-6250 
-6638 
*7333 

0-5447 
+5656 

+5920 
*6307 
‘7010 


0-9862 
-9899 
-9934 
-9968 
-9994 

0-9385 
9485 
*9595 
‘9726 
-9884 

0-8807 
“8952 
‘9118 
9333 
-9634 


0-8233 
*8405 
-8610 
“8884 
-9302 


0-7697 
-7886 
*8114 
-8429 
*8934 

0-7208 
-7407 
-7650 
-7991 
+8559 

0-6768 
-6971 
*7222 
*7579 
-8190 

0-6372 

-6576 

-6830 

-7197 

*7837 

-6015 

-6219 

-6474 

-6844 

-7502 

-5693 

-5895 

-6148 

-6519 

‘7188 


Oo 


o 








Generalized Beta distribution: 100P %, points for x 
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0-9877 
-9910 
-9942 
*9972 
-9994 

0-9445 
*9536 
-9636 
‘9754 
-9895 

0-8913 
*9045 
-9197 
-9393 
-9668 


0-8374 
“8534 
*8723 
*8976 
-9361 

0-7865 
*8042 
+8255 
*8548 
-9017 


0-7396 
-7584 
*7812 
*8131 
*8663 


0-6969 
-7162 
-7399 
‘7737 
“8312 

0-6582 
*6777 
-7019 
-7367 


‘7973 


0-6231 
6427 
-6670 
7024 
:7650 

0-5912 
‘6107 
-6350 
-6706 
7345 


values of « for which Pr(6,,,,. <«)=I,(4; p, g) =P, where p 








Upper percentage points of the generalized Beta distribution. III 


0-9889 
-9919 
*9947 
-9974 
*9995 


0-9495 
*9578 
-9668 
‘9776 
*9905 

0-9001 
*9123 
-9263 
9443 
-9696 


0-8494 
-8643 
-8819 
9054 
-9410 

0-8010 
-8176 
-8376 
-8650 
-9088 

0:7559 
1737 
‘7952 
+8253 
+8752 

0-71.46 
-7329 
1554 
‘7874 
‘8417 

0-6768 
6955 
‘7185 
‘7517 
-8092 

0-6423 
-6611 
6845 
-7183 
-7780 

0-6108 
-6296 
-6530 





11 


0-9899 
*9926 
*9952 
‘9977 
9995 

0-9536 
9613 
-9696 
“9795 
-9913 


0-9075 
-9189 
-9319 
*9486 
*9719 

0-8598 
*8737 
-8901 
*9121 
*9453 


0-8136 
+8293 
*8481 
*8738 
*9149 


0-7703 
‘7871 
*8075 
-8300 
-8830 

0-7302 
*7477 

| +7691 
*7995 
*8510 

0-6933 
°7112 
+7333 








‘7649 
‘8197 
| 06595 

6776 | 

| -7000 | 

| °7325 

‘7895 | 

0-6285 | 


¥e—3), q=4(%4—3). 





36 | 27 


75 | 29 


98 _ 3 


36 | 33 


103 | 35 


-— 
or 


302 37 





VS 
— © 
oo 


933 39 





285 | 43 





| 
4 | 5 | 
ae 
0-4177 | 0-4554 | 
-4405 -4774 
-4696 -5054 
5133 | +5472 
5951 | -6247 
0-3932 | 0-4300 
‘4152 | -4513 
4433 | -4786 
-4858 +5194 
-5661 +5959 
0-3713  0-4072 
+3925 -4279 
-4197 -4543 
-4609 -4941 
-5395 -5694 
0-3518 | 0-3867 
-3722 -4067 
‘3985 |  -4323 
4384 | -4711 
‘5152 | = +5450 
0-3341 | 0-3681 
-3538 -3874 
-3792 -4123 
-4180 -4500 
-4929 5224 
0-3181 | 03511 
-3371 -3699 
-3617 -3940 
-3993 -4307 
4724 -5016 
0-3036  0-3357 
-3220 -3538 
+3457 -3773 
-3822 -4130 
-4534 -4823 
0-2903  0-3215 
-3081 -3391 
‘3311 -3618 
-3664 -3966 
-4359 4644 
0-2782 | 0-3085 
-2953 “3255 
-3176 -3476 
-3519 -3814 
‘4196 | -4476 
0-2670 | 0-2965 
2836 | -3130 
-3051 +3345 
+3385 -3674 
4044 -4321 


F. G. Foster 


Generalized Beta distribution (cont.) 
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0-4876 
-5089 
-5358 
*5758 
-6495 

0-4616 
-4823 
-5086 
-5479 
-6211 

0-4382 
-4583 
-4840 
*5225 
5948 

0-4170 
-4365 
-4615 
-4992 
-5704 

0-3977 
-4167 
-4410 
-4778 
-5478 

0-3801 
-3985 
+4222 
*4581 
-5268 

0-3639 
-3818 
-4049 
-4399 
-5073 

0-3490 
+3665 
-3889 
-4230 
-4891 

0-3353 
+3523 
“3741 
-4074 
-4722 


0:3227 


-3391 
-3604 
-3929 
4563 


0-5155 
-5361 
+5621 
-6004 
-6708 

0:4893 
-5094 
-5348 
-5727 
-6428 

0-4655 
“4851 
+5106 
+5472 
-6168 

0-4438 
+4629 
-4873 
+5238 
-5926 

0-4240 
-4426 
-4664 
-5022 
-5700 

0-4058 
-4240 
*4472 
-4822 
+5490 

0:3891 
-4068 
+4295 
-4637 
+5294 

0:3738 
-3910 
-4131 
-4466 
-5111 

0°3595 
-3763 
-3978 
-4306 
-4939 

0:3463 
+3626 
-3837 
‘4157 
‘4778 


| 
| 
| 


8 


0-5402 
-5601 
“5851 
+6220 
-6894 

0-5138 
-5333 
-5580 
-5945 
-6619 

0-4898 
-5089 
-5331 
-5691 
-6362 

0-4678 
-4865 
-5102 
*5457 
-6122 

0:4476 
-4659 
-4891 
-5240 
-5898 

04291 
-4469 
-4697 
-5039 
-5688 

0-4120 
-4294 
-4516 
*4852 
-5492 

0-3962 
-4132 
-4349 
-4678 
-5308 

0-3815 
-3981 
-4193 
-4516 
-5135 

0:3679 
*3841 
-4048 
-4364 
-4973 


This table gives the values of x for which Pr(0,,,, <7) =I,(4; p, q)=P, where p= }(v,—3), g=4(v,—3). 


0-5622 
-5815 
+6056 
-6412 
*7057 

0-5358 

+5548 

‘5786 

-6139 

-6787 

-5116 

+5303 

+5538 

-5887 

-6534 

0-4895 
-5077 
+5309 
-5654 
-6297 

0-4691 
-4870 
-5097 
-5437 
-6075 

0:4503 
-4678 
-4901 
+5235 
-5867 

0-4329 
-4500 
-4718 
-5047 
-5671 

0-4167 
-4335 
-4549 
‘4871 
+5487 

0-4017 
-4181 
-4390 
-4707 
-5314 

0-3878 
-4037 
+4242 
-4553 
*5151 


So 


0-5820 
-6007 
-6240 
-6583 
-7203 

0-5557 
-5741 
-5973 
-6314 
-6938 

0-5315 
-5496 
*5725 
-6064 
-6690 

0-5092 
-5271 
+5496 
-5832 
6456 

0-4887 
-5062 
+5284 
-5616 
*6236 

0-4697 
-4869 
-5087 
-5414 
+6029 

0-4521 
-4689 
-4903 
+5225 
-5834 

0-4357 
-4522 
-4732 
-5048 
+5650 

04204 
-4365 
-4572 
-4883 
*5477 

0-4061 
*4219 
-4422 
*4727 
-5313 


0-6000 
-6180 
-6406 
-6737 
*7334 

0-5738 
-5917 
-6141 
-6472 
-7075 


0-5496 
-5673 
-5896 
+6225 
-6830 

0-5274 
-5448 
-5668 
-5995 
-6600 

0-5067 
-5239 
+5456 
‘5779 
-6382 

0:4876 
-5044 
-5258 
+5578 
*6177 

0-4698 
-4863 
-5074 
-5389 
-5983 

0-4532 
-4695 
-4901 
-5212 
-5800 

0-4377 
*4537 
-4740 
-5046 
-5627 

0:4233 
-4389 
-4588 
-4889 
+5464 





| 


| 






Upper percentage points of the generalized Beta distribution. III 


Generalized Beta distribution (cont.) 








































































; 11 
Vy he 
P | | | 
45 0-80 | 0-2566 | 0-2853 | 03109 | 0-3340 | 0-3552 | 0-3747 | 0:3928  0-4097 
85 | 2727 | 3014-3269 +3499 | --3710 | -3003 | -4083 —-4250 
90 2936 | -3222 3476 -3704 3913 | -4104 “4281 4446 
95 | 3260 | +3543 | +3793 | 4017 | -4221 | 4409 | -4581 -4742 
99 | +3903 | -4175 | -4414  -4627 | «4820 | -4997 | -5159 | -5309 
47 0-80 0-2471 0-2750 | 0-2999 0-3226 0-3433 0-3625 0-3803 0-3969 
85 2626 2906 | +3155 -3381 ‘3587 :3778 +3954 “4119 
‘90 | +2830 -3109 +3357 ‘3581 -3786 -3974 -4149 -4312 
‘95 3144 | 3421 -3666 -3887 -4088 -4273 4444 -4603 
99 3772 | «-4039 | «+4274 | = -4485 -4676 -4851 5013 -5162 
49 0-80 | 0-2382 | 0-2654 | 0-2897 | 03119 | 0:3322 | 0-3510 | 0-3685  0-3850 
85 2533 -2806 -3049 -3270 3472 -3660 +3834 -3997 
-90 -2730 -3003 +3246 ‘3465 -3666 “3852 -4024 “4185 
95 -3036 -3307 “3548 -3764 3963 “4145 -4314 4471 
99 3648 ‘3911 -4143 -4351 -4540 -4714 -4874 5023 
51 0-80 | -0-2299 | 02564 — -0-2802 03018 | 03218 | 0-3403 | 0:3575 | 0-3737 
85 2446 -2712 -2950 -3166 ‘3365 3549 -3720 -3881 
-90 :2638 -2904 ‘3141 +3357 +3554 -3737 -3907 -4066 
‘95 -2936 “3200 3436 3649 “3844 -4024 -4191 4347 
99 ‘3533 ‘3791 -4019 “4224 -4412 -4584 “4743 -4891 
53 0-80 | 00-2222, 02480 =—-0-2713 | 0-2924 | 03120 | 0-3301  0-3471 —-0-3630 
85 -2364 +2624 2857 -3068 3263 +3444 3613 ‘3771 
-90 12551 -2811 3043 +3255 “3449 -3629 3796 3953 
95 2841 -3100 3332 -3541 3733 -3910 “4075 -4229 
99 +3424 -3677 -3902 “4105 -4290 -4460 -4618 -4765 
55 0-80 = 00-2149 02402 0-2629 = 02836 =| -0-3028 | 0-3206 | 0-3372 | 0-3529 
‘85 -2288 +2542 -2769 2977 -3168 -3346 “3512 -3668 
-90 -2470 +2724 2951 “3159 3349 3526 3691 +3846 
95 2752 -3006 3233 +3439 -3628 -3802 +3965 “4117 
-99 +3322 ‘3571 ‘3792 “3992 “4175 -4343 -4499 -4645 
57 0-80 0-2082 0-2328 0-2550 0-2753 0-2941 0-3116 0-3279 0-3434 | 
85 -2217 +2464 -2687 -2890 -3078 ‘3253 3416 -3570 | 
-90 +2394 -2642 2865 -3068 3255 +3429 ‘3592 ‘3745 
‘95 2669 -2918 -3140 ‘3342 ‘3528 -3700 -3861 -4011 
-99 +3225 -3470 -3687 3885 4065 -4232 -4387 -4531 
590-800-2018 | 0-2258 | 0-2475 | 0-2674 | 0-2858 | 0-3030 0:3191  0-3343 
85 -2150 -2391 -2609 -2808 -2993 -3164 3325 ‘3476 
-90 +2322 +2565 2783 -2983 -3167 -3338 -3498 3648 
‘95 -2591 -2834 *3052 “3251 +3434 -3603 -3762 -3910 
‘99 +3134 ‘3374 “3589 ‘3783 -3961 -4126 -4279 -4422 
61 0-80 | 01958 02193  0-2405  -0-2600 ~~ 0-2781 -, 0-2949  -0-3108 ~~ 03257 
85 2087 2323 2536 | -2731 2912 | -3081 -3239 +3388 
‘90 +2254 +2492 ‘2706 | +2902 -3082 | -3251 +3408 +3557 
95 2517 2755 -2969 ‘3164 3344 | -3511 3668 +3814 
‘99 -3048 “3284 ‘3495 3686 ‘3862 | -4025 “4177 -4319 
63 0-80 0-1902 0-2131 0-2339 0-2530 0-2707 | 0-2873 0-3029 0-3176 
85 :2027 2258 +2467 -2658 -2836 ‘3001-3157 +3304 
-90 2191 12423 2633 2825 ‘3002 | -3168 -3323 -3470 
“95 +2447 +2680 -2890 +3082 -3259 +3424 +3578 +3723 
‘99 +2966 +3198 +3406 +3594 ‘3768 | -3929 | +407 -4219 





This table gives the values of x for which Pr(Anax,<%)=I,(4; p, q)=P, where p= }(v,—3), g=4(v,—3). 
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67 


69 


71 


73 


75 


77 


79 


81 


83 


This table gives the values of x for which Pr(6,,,, <*”)=I,(4; p, q/=P, where p=}(v,—3), 


F. G. Foster 


Generalized Beta distribution (cont.) 





| 


0-1849 
1971 
*2131 
-2381 
+2889 

0-1798 
-1917 
+2074 
+2318 
+2816 

0-1751 
-1867 
-2019 
+2259 
-2746 

0-1706 
-1819 
-1968 
-2202 
+2679 

0-1663 
*1774 
-1919 
+2148 
-2616 


0-1622 
-1730 
-1873 
+2097 
*2555 

0-1583 
-1689 
-1829 
+2048 
-2498 

0-1546 
-1650 
-1787 
+2002 
+2442 

0-1510 
-1612 
-1746 
*1957 
+2390 

0-1477 
-1576 
-1708 
-1914 
+2339 


+2479 
-2966 
0-1915 
-2031 
+2182 
-2418 
+2896 
-1868 
-1981 
-2129 
+2360 
-2828 
0-1823 
-1934 
-2078 
-2305 
-2764 
0-1780 
-1888 
-2030 
+2252 
-2703 
0-1739 
-1845 
-1984 
+2202 
-2645 
0-1700 
*1804 
-1940 
+2154 
+2589 
0-1663 
-1765 
-1898 
+2108 
+2535 


S 


0-2217 
+2339 
-2498 
+2745 
-3240 

02160 
+2280 
-2435 
-2677 
-3163 

0-2107 
-2223 
-2376 
-2613 
-3090 


0-2055 
-2170 
-2319 
*2552 
+3020 

0-2007 
-2119 
+2265 
*2493 
+2953 

0-1960 
-2070 
+2214 
+2437 
-2889 


0-1916 
-2024 
-2164 
+2384 
+2827 


0-1874 
-1979 
*2117 
2333 
+2769 
-1823 
-1937 
-2072 
+2284 
*2712 


o 


0-2463 
+2589 
*2752 
-3004 
+3507 


0-2400 
+2523 
-2683 
+2930 
+3424 

0-2340 
+2460 
-2617 
+2859 
"3344 

0-2283 
-2401 
+2554 
*2792 
+3268 

0:2229 
+2344 
-2494 
-2728 
+3195 

0-2177 
+2290 
*2437 
+2666 
-3126 

0-2127 
+2238 
+2383 
-2608 
*3059 


0-2080 
-2189 
-2331 
2551 
-2995 
-2035 
2141 
-2281 
-2497 
-2934 
0-1991 

-2096 

2233 

+2446 

2875 


o 


0-2637 
+2763 
*2927 
-3178 
+3678 

0-2571 
-2694 
+2854 
-3101 
+3592 

0-2508 

+2629 

-2786 

-3028 

*3511 

+2448 

+2566 
+2720 

-2958 

+3432 

0-2390 

-2506 

-2658 

*2891 

+3357 

-2336 

+2450 

+2598 
+2827 

-3286 

0:2283 
-2395 
*2541 
-2766 
°3217 

0-2233 
-2343 
-2486 
-2707 
“3151 

0-2185 

-2293 

+2434 

+2651 
+3088 

-2140 

+2245 

+2383 

*2597 


+3027 


So 


S 


o 








0-2800 | 0-2953 
+2926 3079 
-3089 3242 
-3341 | +3493 
‘3837 |  -3985 

02731 | 0-2881 
+2854 -3005 
3015 | -3165 
3261 | -3411 
-3749 | -3896 

0:2665 | 0-2813 
-2786 +2935 
-2943 3092 
3185 -3333 
-3666 3811 

02602 | 0-2748 
2721 -2867 
2875 -3021 
-3113 -3259 
3585 3729 

0-2542 | 0-2686 
+2659 +2803 
-2810 +2954 
3044 -3188 
+3509 3650 

0-2485 | 0:2626 
+2599 2741 
2748 -2890 
-2978 -3119 

| 3435 +3575 

02430 | 0-2569 
2543 -2682 
-2689 -2828 
-2914 3054 
3364 +3502 

| 0-2378 | 0-2514 
| +2488 +2625 
-2632 -2769 
+2853 -2991 
+3296 -3433 

0:2327 0:2462 
2436 2571 
2577 2713 

2795 -2931 
3231 3366 
0:2279 | 0-2412 

| +2386 | +2519 
| +2525 | -2658 
| .2739 | -2873 
| -3168 -3302 


| 


0-3098 
3224 
-3387 
-3636 

| +4125 

0-3024 
-3148 
-3307 
+3552 
-4034 
0-2953 | 
+3075 | 
-3232 
3473 | 
*3947 | 
| 0-2886 
-3005 | 
| 
| 


| 





+3160 
+3396 
+3864 


0-2822 | 
2939 | 
-3090 | 

| +3323 | 

| +3784 | 

| 0-2760 

| +2875 

“3024 
"3253 | 

‘3707 | 

| 


0-2701 
-2814 
-2961 
+3186 
+3633 


| 

0-2644 | 

| +2756 | 
| | 


-2900 
+3122 
+3562 

0-2590 | 
-2700 | 
| 2842 | 

| 3060 

+8494 

0-2538 | 

2646 | 
2785 
-3000 
3428 








q=4h(v,— 3). 






























































Upper percentage points of the generalized Beta distribution. III 
Generalized Beta distribution (cont.) 
Pee ey 2a | | | | P 
| *% 2 4 5 6 7 8 9 1 | It 
| v - | | | 
i. : ad > shes —_ ee | base | 
| P | | | 
85 0-80 | 01444 | 01627 | 0-1794 | 0-1950 | 0-2096 | 0-2233 00-2364 02488 | 
| 85 | +1542 1727 -1896 2053 | -2200 2338 | -2469 2594 
-90 -1671 -1858 -2029 -2187 .2335 2474 -2606 | +2731 | 
‘95 | +1874 -2064 -2236 2396 | +2545 2685 -2818 -2943 | 
| 99 | -2290 -2483 .2658 2819 | +2968 -3108 -3240 3365 | 
| | 
87 0:80 01413  0:1593  0-1757 | 01910 | 0-2053 | 0-2189 | 0-2317 | 0-2440 | 
| ‘85 | +1509 -1691 +1857 2011 | -2155 2292-2421 2544 
| -90 -1635 -1819 -1987 .2143 -2289 2426 | -2556 -2680 
| 95 | -1834 -2021 2191 2349 | +2496 -2634 -2764 -2888 
-99 2244 2434 .2606 2764 | -2912 -3050 3180 | +3304 
| 89 0-80 | 01384 01560 | 0-1721  0-1872 | 0-2013 00-2146 = 0-2273 | -0-2394 | 
| “85 -1478 1656 -1819 1971 | -2113 +2247 2375 | +2496 | 
-90 -1602 -1782 -1948 2101 | -2244 -2379 +2508 2629 | 
| 95 +1797 -1981 -2148 2303 | -2448 +2584 2713 -2835 | 
| -99 -2199 -2386 2556 2712 | 2857 | +2994 3123-3245 | 
| 91 0-80 | 01355  0:1528 01687 01835 | 0-:1974  0-2105 | 0-2230 | 0-2349 | 
+85 1447 -1623 -1783 -1932 -2073 -2205 -2330 2450 
-90 -1569 +1747 -1909 -2060 -2201 -2335 2461 -2581 
| 95 -1761 -1942 -2106 2259 | +2402 -2536 -2663 2784 | 
99 | -2156 .2340 .2507 -2661 -2805 -2940 -3067 -3188 
| 93 0-80 | 0-1328 | 01498  0-1654  0-1800 | 0-1936  0-2066 —0-2189 _ 0-2306 
‘85 | -1418 -1591 -1749 -1895 -2033 2164 -2288 | +2406 
-90 -1538 1713 -1872 -2021 -2160 2291 2416 2535 
95 -1726 -1904 -2066 ‘2217 | +2357 -2490 2615 .2734 
99 | -2115 2296 -2461 2613 | +2755 -2888 -3014 3133 | 
95 0-80 0-1302 | 0-1469 | 01622 , 0-1765 | 0-1900 | 0-2028 | 0-2149  0-2265 | 
85 | -1391 -1560 “1715 -1860 1996 | +2124 2246 2363 | 
-90 -1508 -1680 -1837 -1983 ‘2120 | -2250 -2373 -2490 | 
95 | -1693 -1868 -2028 -2176 2314 12445 -2569 -2687 
-99 -2075 2253 -2416 2566 -2706 -2837 -2962 -3080 
97 0-80 | 01277 | 01441 | 01592 06-1733 | 0-1865 | 0-1991  O-2111 | 0-2225 
85 -1364 1530 | -1683 -1826 -1959 -2086 -2206 2321 
‘90 | +1479 1648 | -1803 1947 | -2082 -2210 -2331 12447 | 
95 | -1661 -1833 | -1990 -2136 -2273 -2402 ‘2524 | +2641 
-99 +2037 -2213 +2373 2521 -2659 -2789 2912 | -3028 
99 0-80 | 01252 01414 01562 060-1701 | 0-1832 0:1956 | 0-2074 | 0-2186 | 
‘85 -1338 -1502 | -1652 -1792 -1924 -2049 2168 | -2281 
90 1451 -1618 -1770 -1912 2045 2171 -2291 2405 
95 -1630 -1800 1955 -2098 -2233 -2360 | -2481 -2596 
99 -2000 -2173 -2331 2477 2614 2742 -2863 -2979 | 
| 101 0-80 | 01229 «01388 = 00-1534 = 01671 | 01799 | -0-1921 | 0-2038 ~—-0-2149 i 
| 85 -1313 -1474 -1622 -1761 -1890 .2014 -2131 +2243 | 
-90 1425 -1588 -1738 878 -2009 2134 2252 +2364 | 
95 -1600 -1767 -1920 -2062 +2195 -2320 -2439 -2553 
99 -1964 2135 -2291 +2435 -2570 -2696 -2816 2931 
103. 0-80 ~=—s-0-1207'-—«,s«0-1363.— «00-1507 =—s«0-1641 = «01768 «| «0-1888 =: 0-20038~—0-2.113 
85 -1289 -1448 +1594 -1730 -1858 -1979 -2095 +2205 | 
90 | +1399 -1560 -1708 -1846 1975 -2098 .2214 2325 
95 -1572 -1736 -1887 -2026 -2158 -2281 -2399 2511 
| 99 -1930 -2098 +2252 .2394 +2527 2652 2771 -2884 








This table gives the values of « for which Pr(0,,,, <2)=1,(4; p, q)=P, where p= }(v,—3), g=4(v,—3). 





Generalized Beta distribution (cont.) 


F. G. Foster 





105 


107 


109 


111 


113 


115 


117 


119 


121 


123 





0-1124 
-1202 
-1304 
-1467 
-1803 

0-1105 
-1182 
-1283 
-1443 
*1774 

0:1087 
+1162 
-1262 
-1419 
+1746 

0-1070 
-1144 
-1242 
-1397 
-1719 

0-1052 
+1125 
-1222 
+1375 
-1692 

0-1036 
-1108 
-1203 
-1354 
+1667 

0-1020 
“1091 
-1184 
-1333 
-1642 


0-1339 
+1422 
1533 
-1706 
-2063 

0-1315 
-1398 
-1506 
‘1677 
-2028 

0-1293 
“1374 
1481 
-1649 
1995 

0-1271 
“1351 
1456 
1622 
1963 

0-1250 
“1329 
1432 
1595 
1932 

0-1230 
“1307 
-1409 
1570 
1901 

0-1210 
1286 
-1387 
1545 
-1872 

0-1191 
1266 
1365 
‘1521 
‘1844 

0-1172 
1246 
-1344 
1498 
1816 

0-1154 
“1227 
“1324 
1476 
-1789 


01480 
1566 
-1678 
“1854 
‘2214 

0-1455 
-1539 
1650 
1823 
-2178 

01430 
‘1513 
1622 
1793 
2143 

01407 
1488 
-1596 
“1764 
-2108 

0-1384 
1464 
1570 
1736 
2075 

0-1361 
1441 
1545 
-1708 
2043 

0-1340 
1418 
1521 
1682 
-2012 

0-1319 
1396 
“1497 
1656 
1982 

0-1299 
“1374 
‘1474 
‘1631 
1953 

0-1279 
1354 
“1452 
1607 
1925 


| 


| 


0-1613 
-1700 
-1814 
-1992 
*2355 

0-1586 
-1671 
-1784 
-1959 
+2316 

0-1559 
-1644 
-1754 
-1927 
+2280 

0-1533 
-1617 
-1726 
-1896 
-2244 

0-1509 
“1591 
-1698 
-1866 
+2209 

0-1485 
*1565 
‘1671 
*1837 
*2175 

0-1461 
-1541 
-1645 
-1809 
*2143 


0:1439 
‘1517 
+1620 
+1782 
*2111 

0-1417 

-1494 

-1596 

-1755 

-2080 

-1396 

1472 

*1572 

-1729 

-2050 


—) 


0-1738 
-1826 
-1942 
-2122 
-2486 

0-1709 
-1796 
-1910 
-2087 
+2446 


0-1681 
-1766 
-1878 
2053 
-2408 

0-1653 
-1738 
-1848 
2021 
-2370 

01627 
‘1710 
-1819 
-1989 
2334 

0:1601 
1683 
‘1791 
1958 
-2299 

0-1576 
1657 
1763 
-1929 
+2265 

0-1552 
1632 
-1737 
-1900 
+2232 

0-1529 
1608 
‘1711 
-1872 
-2200 

0-1506 
“1584 
1686 
“1845 
-2168 


0-1857 
-1946 
-2063 
+2244 
-2610 

0-1826 
-1914 
-2029 
-2208 
-2569 

0-1796 
-1883 
-1996 
2172 
-2529 

0-1767 

-1853 

-1964 

-2138 

-2490 

-1739 

-1824 

“1934 

-2105 

+2452 

0-1712 
1795 
-1904 
-2073 
2416 

0-1686 
-1768 
“1875 
-2042 
2381 

0-1660 
“1741 
-1847 
-2012 
2346 

0-1636 
-1716 
-1820 
-1983 
2313 

0-1612 
-1691 
-1794 
1954 
-2280 


~ 
= 





0-1970 
-2060 
‘2178 
+2360 
2727 

0-1937 
+2026 
-2142 
-2322 
*2685 

0-1906 
-1994 
+2108 
-2286 
2643 

0-1876 
-1962 
-2075 
+2250 
+2603 

0-1847 

‘1932 

+2043 

-2216 

+2565 

"1818 

-1902 

-2012 

-2183 

+2527 

0-1791 
-1874 
-1982 
-2150 
+2490 

0-1764 
-1846 
-1953 
*2119 
*2455 

0-1738 
-1819 
*1924 
-2088 
+2420 

0-1713 
-1792 
-1897 
-2059 
-2386 


f—) 


11 


0-2078 
+2169 
+2288 
-2471 
+2839 

0-2044 
-2134 
*2251 
*2432 
2705 

0-2012 

-2100 

*2215 

+2394 

-2753 

-1980 

-2067 

-2181 

-2357 

-2711 

0-1950 
-2036 
-2148 
+2322 
-2671 

0-1920 
-2005 
“2115 
*2287 
-2633 

0-1891 
-1975 
-2084 
+2254 
+2595 

0-1863 
-1946 
-2054 
-2221 
*2558 

0-1836 
-1918 
-2024 
-2189 
+2523 

0-1809 
-1890 
-1995 
-2159 
-2488 


f—) 









































































500 Upper percentage points of the generalized Beta distribution. III 
Generalized Beta distribution (cont.) “ 
hy. | | | 
~~» oe 7 8 Tce 8 > 1s 11 
"1 ‘i | | ‘ 
a — | a eo | 
ee | | | | 

125 0-80 | 0-1004 | 0-1137  0-1260 | 0-1375 | 0-1484 | 0-1589 | 0-1688 | 0-1784 | 
‘85 | 1074 -1209 +1334 | -1451 | -1561 1666 | +1767 | -1864 
-90 | -1167 | +1304 | -1431 1549 | -1662 ‘1768 | +1870 | +1967 
| ‘95 | -1313 | 1454 +1583 1704 | -1818 ‘1927 | +2030 | -2129 
‘99 | +1618 | +1763 -1897 2021 | +2138 | -2249 | .2354 | -2454 
127 0-80 | 0-0989 | 0-1120 | 01241 | 0-1355 | 0-1463 | 0-1566 | 0-1664 | 0-1759 
85 | -1058 | -1191 | -1314 1430 | -1539 1643 | +1742 | -1838 
-90 -1149 +1285 -1410 -1527 | -1638 1743 | -1844 -1940 
95 1294 1433 ‘1561 | +1680 | +1793 -1900 -2002 +2100 
-99 +1594 -1738 -1870 -1993 -2109 -2218 2322 | -2421 
129 0-80 | 0-0975 | 0-1104 | 0-1223 | 0-1336 | 0-1442 0-1544 | 0-1641 | 0-1735 
85 1043 | +1174 | +1295 1409 | +1517 1620 | -1718 -1812 
-90 -1132 -1266 -1390 ‘1506 | -1615 -1719 ‘1818 | -1914 
95 +1275 +1412 +1538 1656 -1768 +1874 ‘1975 | -2071 
-99 1572 1714 1844 1966 | -2080 -2188 2291 | -2389 
131 0-80 | 0-0960 | 0:1088 | 0-1206 | 0-1317 | 0-1422  0-1523 | 0-1619 | 0-171] 
85 | -1027 | -1157 +1277 -1389 -1496 ‘1597 | -1695 -1788 
90 | -1116 | -1248 | -1370 | -1485 | -1593 | -1695 | -1794 | -1888 

95 | +1257 | -1392 | -1517 | -1634 1744 | -1848 -1948 +2044 | 
99 | -1549 | -1690 -1819 -1939 -2052 | -2159 | -2261 | 2358 
133 0-80 | 0-0947 | 0-1072 | 0-1189 | 0-1299 | 0-1403  0-1502 | 0-1597 | 0-1688 
‘85 | -1013 | -1141 | -1259 -1370 1475 -1576 -1672 -1764 
-90 ‘1100 | -1231 | -1351 | -1464 +1571 -1672 | -1770 -1863 
95 ‘1239 | +1373 1496 | -1611 -1720 1823 | -1922 -2017 
99 | -1528 | -1667 -1794 -1913 2025 | -2131 | -2232 | -2398 
135 0-80 | 0-0933 | 0-1057 | 0-1173 | 0-1281 | 0-1384 | 0-1482 | 0-1575 | 0-1666 
85 | -0998 | -1125 | -1242 | -1351 1455 | +1555 1650 | -1741 
‘90 | -1085 | -1213 | +1333 +1444 -1550 -1650 ‘1746 | -1839 
95 1222 | -1354 1476 | +1589 -1697 -1799 ‘1897 | -1991 
-99 +1507 1644 | -1770 | -1888 -1998 -2103 +2203 -2298 
137 0-80 | 0-0920 | 0-1043 0-1157 | 0-1263 | 0-1365 | 0-1462 | 0-1555 | 0-1644 
‘85 | -0985 ‘1109 | = -1225 -1333 1436 | -1534 -1628 -1718 
-90 | -1070 | +1197 1314 | -1425 +1529 1628 | -1724 | -1815 
95 | +1205 +1335 | 1456 | -1568 1675 -1776 -1872 +1965 
-99 -1487 1622 | +1747 | +1863 -1972 -2076 2175 -2269 
139 0-80 = 00908 ~=—s- 01029 | 0-1141 | 06-1247 | 0-1347 | 0-1443 | 01534 | 0-1623 
85 0971 | -1094 | 1208 | -1315 ‘1417 | +1514 | +1607 -1696 
-90 1055 -1181 | +1297 1406 1509 | +1607 | -1701 -1792 
95 -1189 ‘1317 | +1436 | -1548 1653 | +1753 | +1849 -1940 
-99 1467 -1601 | -1724 | -1839 1947 -2050 | -2148 2241 

141 0-80 | 0-0895 00-1015 | 0-1126 | 0-1230 | 0-1329 | 0-1424 | 0-1515 | 0-1602 | 

85 0958 -1079 1192 | -1298 | *1399 | +1494 1586 | +1675 
-90 ‘1041 | -1165 -1280 ‘1888 | -1489 1587 | +1680 | -1769 
95 1173 -1300 1418 | -1528 | -1632 | -1731 | -1825 | -1916 
99 -1448 -1580 -1702 ‘1815 | -1923 | -2024 2121 | 2214 
143 0-80 0-0883 01001 | O-1111 | 0-1214 | 0-1312 | 0-1406 | 0-1496 | 0-1582 
85 0945 1065 1177 -1281 | -1381 +1475 1566 | -1654 
-90 +1027 -1150 1263 -1370 ‘1470 | 1567 | +1659 -1747 
95 “1157 -1283 1399 | +1508 1611 -1709 -1803 -1892 
99 1429 -1560 -1680 -1793 -1899 -1999 2095 -2187 





| oat coenl ee ee See | —_ 1 


This table gives the values of x for which Pr(O,,,,. <«x)=I,(4; p, q)=P, where p=}(v,.— 3), q=}(v—3). 











Generalized Beta distribution (cont.) 


F. G. Foster 






































4 5 6 7 8 
P 
145 0-80 | 0-0872 | 0-0988 | 0-1097 | 0-1199 | 0-1295 
“85 -0933 ‘1051 ‘1161 +1265 -1363 
-90 1014 “1135 +1247 +1352 +1452 
95 “1142 -1266 -1381 -1489 ‘1591 
99 -1410 -1540 -1659 -1770 +1875 
147 0-80 | 0-0860 | 0:0975 | 0-1082 | 0-1183 | 0-1279 
“85 -0920 -1038 ‘1147 -1249 -1346 
-90 -1000 -1120 -1231 +1335 +1434 
95 -1127 +1250 “1364 ‘1471 “1571 
-99 -1393 +1521 -1639 -1749 -1853 
149 080 | 0-0849 | 0-0963 | 0-1069 | 0-1169 | 0-1263 
“85 -0909 +1025 -1132 +1233 -1329 
-90 -0988 -1106 -1216 -1319 -1416 
“95 ‘1113 +1235 +1347 +1452 +1552 
-99 -1375 -1502 -1618 -1727 -1830 
151 0-80 | 0-0838 | 0-0951 | 0-:1055 | 0-1154 | 0-1248 
“85 -0897 -1012 -1118 -1218 -1313 
-90 0975 -1092 -1201 -1302 -1399 
“95 -1099 -1219 -1330 +1435 +1533 
-99 -1358 -1483 -1599 -1707 -1809 
153 0-80 | 0-0828 | 0-0939 | 0-1042 | 0-1140 | 0-1233 
“85 -0886 -0999 “1104 -1203 +1297 
-90 -0963 -1079 -1186 +1287 “1382 
95 -1086 -1204 -1314 -1417 “1515 
“99 +1342 -1466 -1580 -1687 ‘1787 
155 0-80 | 0-0817 | 0-0927 | 0:1030 | 0-1126 | 0-1218 
“85 ‘0875 0987 ‘1091 ‘1189 -1282 
-90 0951 -1066 “1172 ‘1271 -1366 
95 -1072 -1190 -1299 1401 -1497 
‘99 -1326 -1448 “1561 -1667 +1767 
157 0-80 | 0-0807 | 0-0916 | 0-1017 | 0-1113 | 0-1203 
85 | -0864 | -0975 | -1078 | -1175 | -1267 
-90 -0940 -1053 “1158 -1256 | +1350 
95 -1059 ‘1176 -1283 1384 | -1480 
99 | +1310 | +1431 1543 | +1648 | -1746 
159 0-80 | 0-0798 | 0-0905 | 0:1005 | 0-1100 | 0-1189 
“85 -0854 -0963 -1065 1161 | -1252 
-90 -0928 -1040 -1144 1241 | +1334 
“95 -1047 -1162 -1268 1368 | +1463 
99 -1294 +1414 “1525 1629 | = -1727 
161 0-80 | 0-0788 | 0-0894 | 0-0993 | 0-1087 | 0-1176 
85 -0844 -0952 1053 1147 | +1237 
-90 0917 -1028 ‘1131 1227 | -1319 
“95 -1034 -1148 +1253 1352 | +1446 
-99 -1279 -1398 “1508 1610 | 1707 
163 0-80 | 0-0779 | 0-0884 | 0-0982 | 0-1074 | 0-1162 
85 0834 -0941 -1040 1134 |  -1223 
-90 ‘0907 -1016 ‘1118 -1213 -1304 
95 -1022 “1135 -1239 +1337 -1430 
-99 “1265 -1382 ‘1491 ‘1592 -1688 








0-1388 
+1457 
+1547 
-1688 
*1975 


0-1371 
-1439 
-1528 
-1667 
*1951 

0-1354 
-1421 
-1509 
*1647 
-1928 

0-1337 
-1404 
-1491 
-1627 
-1905 


0-1321 
-1387 
-1473 
-1608 
-1883 


0-1306 
1371 
+1456 
-1589 
-1861 


0-1290 
+1355 
-1439 
*1571 
-1840 


0-1275 
-1339 
-1422 
-1553 
-1820 

0-1261 
*1324 
-1406 
*1535 
-1799 


0-1246 
-1309 
-1390 
*1518 
-1780 








10 11 
0-1477 0-1562 
*1547 -1633 
+1638 -1726 
-1780 -1869 
-2070 -2161 
0-1459 0-1543 
-1528 -1613 
-1618 -1705 
-1759 -1847 
+2045 -2135 
0-1441 0-1525 
-1509 -1594 
-1598 *1684 
-1738 *1825 
2021 -2110 
0-1423 0-1506 
-1491 +1575 
*1579 -1664 
-1717 -1803 
-1998 +2086 
0-1406 0-1488 
-1473 -1556 
-1561 *1645 
-1697 -1783 
*1975 -2062 
0-1390 0-1471 
*1456 -1538 
*1542 -1626 
°1677 +1762 
*1952 *2039 
0-1374 0-1454 
-1439 -1520 
“1525 -1607 
-1658 +1742 
-1930 -2016 
0-1358 0-1437 
-1423 -1503 
-1507 -1589 
-1639 °1722 
-1909 *1994 
0-1342 0-1421 
-1406 +1486 
-1490 *1571 
-1621 -1703 
-1888 *1972 
0-1327 0-1405 
-1391 -1470 
*1474 +1554 
-1603 -1685 
-1867 *1951 








This table gives the values of x for which Pr(6,,,, <*%)=I,(4; p, q)=P, where p=}(v,—3), g=43(¥,— 3). 
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Generalized Beta distribution (cont.) 


Upper percentage points of the generalized Beta distribution. III 























































Ys 4 5 6 
VY | 
| 
FP 
165 0-80 | 00770 | 0-0874 0-0971 
‘85 | -0824 -0930 -1028 
‘90 | -0896 +1004 1105 
95 -1010 1122 +1225 
-99 -1250 +1367 1474 
167 0-80 | 0-0761 | 0-0864 | 0-0960 
“85 0815 -0919 1017 
-90 056 0993 -1092 
95 0999 -1109 1211 
99 -1236 1351 -1458 
169 0-80 | 0-0752 | 0-0854 | 0-0949 
“85 0805 -0909 -1005 
-90 -0876 0982 1080 
95 -0988 -1097 -1198 
99 -1222 1336 1442 
171 0-80 | 0-0744 | 0-0844 | 0-0938 
“85 0796 -0899 0994 
-90 0866 0971 -1068 
95 -0977 1085 1185 
-99 1209 +1322 1426 
173 0:80 | 0-0735 | 0-0835 | 0-0928 
“85 -0787 0889 0983 
-90 0856 0960 1057 
95 0966 -1073 -1172 
99 1196 -1308 1411 
175 0-80 | 0-0727 | 0:0826 | 0-0918 
85 -0779 -0879 0973 
-90 -0847 0950 1045 
95 0955 -1061 1159 
-99 -1183 1294 1396 
177. 0-80 | 0:0719 | 0-0817 | 0-0908 
“85 0770 0870 0962 
-90 0838 -0939 1034 
95 0945 1050 +1147 
99 -1170 1280 +1382 
179 0:80 | 0-0712 | 0-0808 | 0-0898 
“85 0762 -0860 0952 
-90 0829 0929 1023 
95 0935 -1039 +1135 
99 1158 +1267 1367 
181 0-80 | 0:0704 | 0-0800 | 0-0889 
“85 0754 0851 0942 
-90 0820 0920 1012 
95 0925 -1028 1123 
99 1146 +1254 1353 
183 0-80 | 0-0697 | 0-0791 0-0880 
85 0746 0842 0932 
90 -0811 0910 1002 
95 0915 1017 1112 
99 1134 -1241 1340 























7 8 9 10 11 
0-1062 | 0-1149 | 0-1232 | 0-1312 | 0-1390 
‘1121 -1210 +1294 -1375 1453 
“1199 -1289 -1375 +1457 *1537 
+1322 “1414 -1502 1586 -1666 
+1575 -1670 -1760 -1847 -1930 
0-1050 | 0-1136 | 0-1219 | 0-1298 | 0-1375 
-1109 -1196 -1280 -1360 1438 
-1186 “1275 -1360 1441 1520 
-1307 -1398 -1485 -1568 -1648 
1558 “1652 ‘1741 +1827 “1910 
0-1038 | 0-1124 | 0-1205 | 0-1284 | 0-1360 
-1096 -1183 1266 -1345 +1422 
‘1173 ‘1261 1345 1426 “1504 
1293 -1383 -1469 +1552 1631 
‘1541 1634 ‘1723 -1808 -1890 
0-1027 | O-1111 | 01192 | 0-1270 | 01345 
1084 ‘1170 -1252 -1331 1407 
-1160 +1247 ‘1331 ‘1411 -1488 
-1279 1368 1453 +1535 “1634 
1524 -1617 :1705 -1789 -1870 
0-1016 | 0-1099 | 01179 | 0-1257 | 0-1331 
‘1073 “1157 1239 | +1317 1392 
‘1147 +1234 1316 | +1396 “1472 | 
1265 +1354 1438 ‘1519 *1597 
-1508 -1600 -1687 ‘1771 ‘1851 
0-1005 | 0-1088 | 0-1167 | 0-1243 | 0-1317 
1061 “1145 -1226 -1308 1378 
*1135 ‘1221 1303 ‘1381 “1457 
+1252 -1339 1423 1503 1580 
1492 1583 "1670 | +1752 “1832 
0-0994 | 0-:1076 | 01155 | 0-1230 | 0-1303 
-1050 1133 1213 | -1289 1363 
‘1123 -1208 -1289 -1367 1442 | 
-1239 1325 1408 1488 1564 
‘1477 -1567 1653 | +1735 ‘1814 
0-0984 | 0-1065 | 0-1143 | 0-1218 | 0-1290 
-1039 ‘1121 -1200 -1276 “1349 
‘1111 ‘1195 ‘1276 1353 1427 
1226 “1312 1394 -1473 1549 
1462 ‘1551 1636 ‘1717 -1796 
0-0973 | 0-1054 | 0-1131 | 01205 | 0-1277 | 
1028 ‘1110 ‘1188 1263 1336 
‘1100 “1183 1263 -1339 “1413 
“1213 -1298 -1380 1458 1533 
+1447 ‘1535 -1620 -1700 ‘1778 
0-0963 | 01043 | 0-1119 | 0-1193 | 0-1264 
‘1017 :1098 ‘1176 -1250 +1322 
-1089 ‘1171 -1250 1326 -1399 
‘1201 “1285 ‘1366 “1443 ‘1518 
-1432 -1520 :1603 1684 ‘1761 























This table gives the values of x for which Pr(6,,,, <2)=1,(4; p, q)=P, where p= }(v,—3), g=4(¥,—3): 
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a Generalized Beta distribution (cont.) 

Vs 4 5 6 7 S 9 10 11 
——- i 
90 P 5 4 
~ 185 0-80 | 0-0689 | 0:0783 | 0-0871 | 0-0953 | 0-1032 | 0-1108 | 0-1131 | 0-1252 
as 85 -0738 -0834 -0923 -1007 -1087 -1164 -1238 -1369 
ie -90 -0803 -0901 -0992 -1078 -1159 +1237 +1312 +1385 

95 -0906 -1007 -1101 -1189 -1272 -1352 -1429 +1503 
75 -99 +1123 -1228 -1326 -1418 +1505 -1588 -1667 +1744 
on 187 0-80 | 0-0682 | 0-0775 | 0-0862 | 0-0944 | 0-1022 | 0-1097 | 01169 | 0-1239 
a “85 -0730 -0825 -0913 -0997 -1076 +1152 -1226 -1296 
we -90 0795 -0892 -0982 -1067 -1148 +1225 -1300 -1371 

“95 -0897 -0997 -1089 “1177 -1260 -1339 1415 -1488 
360 -99 ‘1lll 1216 | -1313 -1404 -1490 1572 -1651 1727 
ms 189 0-80 | 0:0675 | 0-0767 | 0-0853 | 0-0934 | 0-1012 | 0-1086 | 0-1158 | 0-1227 
on “85 -0723 -0817 -0904 -0987 -1066 ‘1141 | -1214 -1284 
00 -90 -0787 -0883 -0972 -1056 -1136 +1213 | -1287 +1358 

95 -0888 -0987 -1079 -1165 -1247 -1326 -1401 -1474 
345 -99 -1100 -1204 -1300 -1390 -1476 +1557 -1635 -1710 
on 191 0-80 | 0-0668 | 0-0759 | 0-0845 | 0-0925 | 0-1002 | 0-1076 | 0-1147 | 0-1215 
Sl “85 -0716 -0809 -0895 -0977 -1055 | -1130 -1202 -1271 
a -90 0779 -0874 -0962 -1046 -1125 -1201 -1274 1345 
9 95 -0879 -0977 -1068 +1154 +1235 -1313 -1388 -1460 
331 -99 "1089 | +1192 +1287 -1377 -1462 1542 -1620 -1694 
ne 193 0-80 | 0-0662 | 0-0752 | 0-0836 | 0-0916 | 0-0992 | 0-1065 | 0-1135 | 0-1204 
sae “85 -0709 -0801 -0886 -0968 1045 | -1119 -1190 +1259 
351 -90 0771 -0865 0953 1035 -1114 -1190 | -1262 +1332 

“95 -0870 -0967 -1057 -1142 +1223 -1300 | -1375 1446 
317 | 99 -1078 -1180 1275 1364 1448 1528 1605 -1679 
a | 195 0-80 | 0-0655 | 0-0744 | 0-0828 | 0-0907 | 0-0982 | 01055 | 0-1125 | 0-1192 
sae | “85 -0701 -0793 -0878 -0958 -1035 ‘1108 | +1179 +1247 
332 | -90 0763 | -0857 0943 1025 ‘1104 | +1178 | +1250 -1320 

| 95 -0862 | -0958 -1047 -1132 1212 | -1288 | -1362 -1433 

“i -99 1068 | -1169 -1263 -1351 1434 | -1514 | -1590 -1663 
363 
442 | | et ! 
564 | This table gives the values of x for which Pr(6,,,, <7)=I,(4; p, gJ=P, where p=}(v.—3), g=4}(v,—3). 
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ESTIMATION OF PARAMETERS OF MIXED EXPONENTIALLY 
DISTRIBUTED FAILURE TIME DISTRIBUTIONS 
FROM CENSORED LIFE TEST DATA* 


By WILLIAM MENDENHALL 


North Carolina State College and Research Techniques Unit, 
London School of Economics 


AND R. J. HADER 
North Carolina State College 


SUMMARY. Statistical methods in life testing analysis have been developed in the past primarily 
for the case of a single failure population. In this paper a failure population which can be divided 
into subpopulations, each representing a different type or cause of failure, is considered. Estimates 
of the population parameters are obtained in tiie case where the subpopulations are exponentially 
distributed and sampling is censored at a predetermined test termination time. 


1. IntTRODUCTION 


Mixed failure populations are encountered in many fields of applied science. In particular, 
engineers have been cognizant of a phenomenon described as ‘early failures’ in tests on 
electronic tubes and other devices. It has frequently been observed that the failure rate is 
initially relatively high, then actually decreases with increasing age. As the item becomes 
still older the failure rate either becomes constant or again increases with age depending on 
the basic failure mechanism involved. This behaviour suggests strongly that the population 
is not homogeneous but rather is made up of several subpopulations mixed in unknown 
proportions. 

For practical purposes, the engineer may divide the failures of a system, or a device, into 
two or more different types of causes. An example is presented by Acheson & McElwee 
(1951), who divided electronic tube failures into gaseous defects, mechanical defects, and 
normal deterioration of the cathode. One would like to know the fraction of the population 
which will fail due to each cause in order to optimally concentrate effort on redesign of the 
system or to improve manufacturing methods. In addition, it would be desirable to know 
the distribution of failure for each type or cause of failure, for example, in order to institute 
an ageing process to eliminate early failures from production. Other references to mixed 
failure populations are made in papers by Davies (1952), Epstein (1953), Herd (1953), 
Madison (1955), Steen (1952), and Wilde (1952). 

This paper will be concerned with the problem of estimating the parameters of a mixed 
population model based upon a sample censored at a fixed test termination time. Attention 
will be directed primarily to the case of two subpopulations of failure, each exponentially 
distributed, because of the light which these results shed on the general problem of estimating 
parameters from mixed population models. The problems of estimation in the case of any 
number of failure subpopulations, each distributed according to a Weibull distribution, will 
be considered briefly in § 8. 

* The avihors are indebted to the General Electric Company for sponsorship and financial support 


in this research. The results of this paper are part of a thesis issued by the Institute of Statistics, North 
Carolina State College, Mimeograph Series, No. 171. 
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It should be noted at this point that the authors are not attempting to establish a case for 
the practicality of the exponential distribution in describing subpopulation failure. Ex- 
ponential failure distributions do occur often in practice, as noted by Davis (1952), Epstein 
(1953), and others, and this fact provides sufficient justification for their use as a starting- 
point in the consideration of mixed failure population problems. 
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2. THE POPULATION AND THE MODEL 


A population is postulated which is composed of s = 2 subpopulations, representing failure 
types, mixed in proportion p: (1—), where 0 < p< 1. Forsimplicity of notation, letg = 1—>p. 
Each unit of the population conceptually contains a tag which indicates the subpopulation 
to which the unit belongs and hence defines the way in which that particular unit will fail. 
The information on the tag, i.e. the cause of failure, is obtained only after failure has occurred. 

The failure times for the ith subpopulation, i = 1, 2, are assumed to have a cumulative 
failure probability distribution defined by 


F. 


v 


(t) = 1-—e“% (0<t<o). (2-1) 


Here and hereafter, i may take the values 1 or 2. If p is the proportion of units belonging to 
subpopulation 7 = 1, then the cumulative distribution function for the population is 


F(t) = pF(t) +4F.(0), (2-2) 
and the density function, S() = pf) + af2(t). (2-3) 
Also let G,(t) = 1— F(t) (2-4) 
and G(t) = 1-Fi(). (2-5) 


The probability function, G(t), is the probability that a unit will survive to time ¢ and is 
called the survival function. 

If the entire population were put on test (or into service) the proportion of items belonging 
to each subpopulation would, in general, change with time. This is because the items from 
one subpopulation would die off more rapidly than those from the other. At time t, the 
subpopulations would be mixed in the proportions p(t):1— p(t). The quantities p(t) and 
1— p(t) will be called the conditional mixture proportions. Obviously 


G 
pt) = "ah, p(0) =p. (2-6) 





3. SAMPLING 


Due to restrictions on time available for testing, the experimenter frequently desires to 
conclude the life test after a predetermined length of time has elapsed or after a predeter- 
mined number of units have failed. Sampling of this type is known as censored sampling. 
This paper will consider the sampling as censored with a fixed test termination time. A ran- 
dom sample of n units is drawn from the population and placed on test. The test is terminated 
at a fixed time, 7’, at which time r units have failed, r; from subpopulation (7) and 7, +7, = r. 
The time of failure of the jth unit from subpopulation (7), t;; is observed. It will hereafter be 
assumed that j always ranges from j = 1 to r; when not specified. The (n—1r) units which 
have not failed yield no information as to the subpopulation from which they were drawn. 
The random variables observed are therefore the r; and the t;;, 7 = 1, 2,...,7;, and @ = I, 2. 
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4. ESTIMATION OF PARAMETERS 


Case A. Relative magnitude of subpopulation parameters not known 
Estimates of the population parameters are obtained by the method of maximum likelihood. 
For convenience, assume that all measurements of time are in units of size 7', the test 
termination time. Therefore let x = ¢/T and let £; = «;/T’. Then 
F(x) =1-e-*4i = (O<a<o). (4:1) 


Given a random sample of n units, the probability of r, units failing due to cause (1), r, 
units failing due to cause (2), and (n—r) units surviving is the multinomial, 


! 
Plrys tym —r|n) = Gp PROG. (4-2) 


The conditional density of obtaining the ordered observations, 2,1, Xj, ..., jp; given r; and 
%;3<1, is 


r;! il Fi(%43) 
j=1 

















P(X 51, Xj95 -++5 ipi|773. Xz <1) = TRO (4:3) 
It then follows that the likelihood, L, for the sample is 
b= aayprge fl fyley) TH fale) (4-4) 
(n—r)! na a. 
Taking the first partial derivatives of In L, 
éinL _k(nm—-r) 1% 1%, (4:5) 
op, ae Se & 
éInL _(1—-k)(n-r) 1, 12%, (4-6) 
op, B3 BP, BR’ 
ome , tS _S-Be-ey (4-7) 
Op p q ¥ 
where = Pree (4:8) 
: (49) 


7 1 + (q/p) e/Ay—-Uhs" 


Referring to equation (2-6), it can be seen that k = p(1), the conditional mixture proportion 
at the test termination time, x = 1. 
When the partial derivatives are equated to zero the estimating equations are 











Fig 





(4-6) 
(4-7) 
(4:8) 


(4-9) 


tion 


t-10) 


t-11) 


t-12) 


t-13) 
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The estimates of £,, £,, and p must be obtained from the solution of the simultaneous 
equations (4-10, 4-11, 4-12, 4-13). Utilizing equations (4-10), (4-11) and (4-12) to substitute 
in equation (4-13) for P, £,, and , yields a single equation, involving only k, of the form 

k = g(k), 


where g(&) is a function of &. Since & is bounded, 0<k <1, itis relatively easy to obtain & by 
considering g(k) —k versus & and obtaining the solution where g(&)—& = 0. The function 
g(k) —k will be positive or zero when k = 0. 


25 * 


20-- 





0 1 ! ! 
01 02 03 04 





Fig. 1. Maximum likelihood estimate of # as a function of % based on a sample from a truncated 
exponential distribution. Measurements expressed in units of truncation time 7’. 


A good first approximation to & can be obtained by using a modification of the maximum 
likelihood estimate obtained by Deemer & Votaw (1955) for the case of samples drawn from 
a single truncated exponential distribution. The maximum likelihood estimate of £;, where 
the distribution is assumed to be truncated at time 7’, is the solution of 


(8; —2,) (e“%—1) = 1. (4-14) 
The solutions, £,, can be obtained graphically from Fig. 1, where /; is given as a function of 


%,;. Choose the smaller % and identify this as subpopulation (1), Obtain the corresponding 
Py) from Fi. 1. Then, substituting into 


By = %; +k, me), (4°15) 
1 





solve for ky. 

The quantity D, = g(k,) —k, can now be computed. Since g(0) > 0, the value of & which 
satisfies D = 0, and hence provides a solution to equations (4°10, 4-11, 4-12, 4-13), must be 
k< k, or k> k, depending upon whether D, is negative or positive. 


Letting = v= Zexp (1/2, —1/As1, (4-16) 

“a dD 7 

~~ 1 g(b (aojak’ ia 

where oad = —v(n—r) [1/9 + 1/P+ 1/r,B3+ 1/723). (4:18) 


dk 
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Choosing dD = —D, 
k, = k,+dk, (4:19) 


= ky+- es . ee (4-20) 
1+ g(ko)? (dv/dko) 

This iterative process cai b. repeated until the desired degree of accuracy has been obtained. 
The estimates of £,, 2, ana » are then obtained by substituting the solution for & into 
estimating equations (4-10), (4:11) and (4-12). 

If r; = 0, the estimating equations give no estimate of £;. This does not offer difficulty in 
a practical sense since, in this case, it is reasonable to conclude that /; is either very large or 
else p; = 0. Let us adopt, as a convention, the estimate, £, = oo, meaning that /; is very 
large when r; = 0. In an experimental situation, we expect to choose 7 and T large enough so 
that the probability that r, = 0 or r, = 0 is very small. Obviously, we cannot expect to 
obtain information on the failure parameters unless we are willing to test until some failures 
are observed. 


Case B. Relative magnitude of subpopulation parameters known 


In many practical situations the experimenter knows the relative magnitude of (,, and /,. 
Without loss of generality, let us assume that £,</,. The maximum likelihood method 
described in case A produces some estimates for which £, > £,. When /, > /,, given 2, < fs, 
we shall say that a crossover has occurred. Since it is known that /, </., it would seem 
reasonable to choose /, = /, when a crossover occurs. The maximum likelihood estimate of 
A, = f, = fis 


pu thnt%+ (n—r) 


4-2 
(4-21) 
and p= +. (4-22) 


Hence the adjusted estimation procedure will be to choose as estimates the solution of 
equations (4:10), (4:11), (4:12) and (4-13) unless the estimates form a crossover. If /, > /,, 
assume /, = £, = f and obtain the adjusted estimates of # and p from equations (4:21) and 
(4:22). 


5. AN EXAMPLE 


The method of solving the likelihood equations, described above, is utilized in the following 
example. 

The data recorded in Tables 1 and 2 are times to failure for ARC-1 VHF communication 
transmitter-receivers of a single commercial airline. Units which failed were removed from 
the aircraft for maintenance. However, in some cases the apparent failures were unconfirmed, 
exhibiting satisfactory operation upon arrival at the maintenance centre. Practical con- 
siderations make it desirable to estimate the fraction of unconfirmed failures in the popula- 
tion. Hence the sample of failures may be subdivided into confirmed failures, shown 
in Table 1, and unconfirmed failures shown in Table 2. The sample was censored at 
T = 630 hours as it was a genet «| policy of the airline to remove units which had operated 
for 630 hours. Histograms plotted for both confirmed failures and unconfirmed failures 
suggest that both subpopulations of failures are exponentially distributed. 
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Considering unconfirmed failures as subpopulation (1) and confirmed_failures as sub- 
population (2), the data from Tables 1 and 2 yield the following: 


n—r = 369-325 = 44, %,=7 


Table 1. Confirmed failures. Hours to failure for ARC-1 VHF radio transmitter receivers* 


n = 369, r, = 107, 


fT. = 218, 
i 0-3034862, 
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r=1,+1, = 325, 


% = ;2 = 0-3644677. 









ty 
id 





16 
392 
408 








224 16 80 | 
576 128 56 | 
384 256 246 | 

16 72 8 | 


216 | 168 | 184 | 
120 | 208 32 | 
208 | 440 | 104 
232 40 | 112 
72 64 40 
168 | 114 | 280 
536 | 400 80 
224 40 32 
464 | 448 | 616 
72 56 | 608 


152 328 480 
168 40 152 
288 168 352 
264 96 224 




















176 | 176 568 





600 40 416 
104 168 408 
168 80 512 
120 320 48 
304 40 160 
256 40 296 
360 80 96 














Table 2. Unconfirmed failures. Hours to failure for ARC-1 VHF radio transmitter receivers* 














136 512 136 
246 72 80 
168 120 616 
112 56 184 


208 114 480 
112 96 64 
272 320 8 











472 
312 

24 

40 
112 
114 
360 


168 
88 











| 











304 16 320 
24 32 232 
456 48 24 











* Data supplied through the courtesy of Dr G. R. Herd, Aeronautical Radio, Incorporated. 


Referring to equations (4-10), (4-11) and (4-12), the estimating equations are 
0-3035 + 0-4112k, 


A; 
Bs 


ll 


0-5663 — 0-2018k, 


p = 0-2900 + 0-1192k. 











(5°1) 
(5-2) 
(5-3) 
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The process of obtaining the iterative solution is simplified by using a table similar to 
Table 3. 
The first step is toenter Fig. 1 with ¥, = 0-303 and obtain the first estimate of 2,, 2) = 0-380. 
The corresponding value of k, &, = 0-186, can be obtained from the first estimating equation, 
(5-1), and then, utilizing i. Po and Po, can be easily obtained. These values are shown in 
row u = 0 of Table 3. 
Table 3. Record of iterations 



































l l zi 
u k, | B. lu B. 2u | Pu | Vu g(k,,) D u | 
| | 
ponte wererars sare | 
0 0-186 | 0-380 0-529 | 0-312 4-622 0-1779 —0-0081 | 
1 166 | +3718 +5328 -3098 5-024 -1660 -0000 | 
2 -167 | *3721 5326 +3099 5-002 +1666 — -0004 | 
3 -165 | +3713 -5330 | +3097 5-046 -1654 -0004 
| 
The next step is to compute 
] 
g(Ko) = (5-4) 


1+ (9o/Po) exp [1/Bio— 1/0] ’ 
and D, = g(ky) —k, = —0-0081. The value of & which corresponds to the solution of the 
maximum likelihood equations will occur when D = 0. Since D is positive or zero when 
& = 0 and negative when k = 0-186, the solution for & must be 0 < k < 0-186. Hence the value 
of & for i = 1 must be less than 0-186. The change in &, dk, can now be computed from 
equation (4-17). 


$ D (—0-0081) 
&,. — ee, ae OE, 5:5 
- * + g(kp)? (dvo/dko) 1 + (0-1779)? (— 19-04) i 
Hence k, = k,+dk, = 0-186 —0-02 = 0-166, 


fy, = 09-3718, f,, = 0-5328, Pp, = 0-3098. 


For all practical purposes, these estimates are the maximum likelihood estimates of the 
parameters since D, = 0-0000. A bound on the iteration error can be obtained by calculating 
D for k, = 0-167 and k, = 0-165. Since D, = — 0-0004 is negative and D, = 0-0004 is positive 
and the solution for & is taken as 0-166, then clearly the absolute value of the iterative error 
for & is less than 0-001. 
The estimate of the fraction of unconfirmed failures is p = 0-3098 and their average life is 
estimated to be 
2, = 2, T = (0-3718) (630) = 234-2 hours. 
The estimate of the average life of the confirmed failures is 
&, = fT = (0-5328) (630) = 335-7 hours. 


It should be noted that a fairly accurate solution for the estimation equations was obtained 
with only one iteration. However, Norton (1956) gives a warning that one or two iterations 
on maximum likelihood equations may not be sufficient. Therefore, it would seem desirable 
to place bounds on the iteration error as was done in this example. All calculations, including 
those for the two boundary vaiues, were made on a desk calculator in less than half an hour. 
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An iterative scheme for an automatic computer can be programmed to deliver a much more 
accurate solution in a minute or less time. 

It was previously mentioned that the parameter of primary interest is p. The estimates of 
the average life of units from the two subpopulations, «, and ,, may be useful in anticipating 
maintenance requirements. In any case, this example represents an unusual and interesting 
application of the methods of estimation for mixed. exponentially distributed failure 
populations. 

6. PROPERTIES OF THE ESTIMATES 


Small sample properties of the estimates were obtained by empirical sampling for various 
parameter points, a parameter point being identified as a specific combination of n, /,, £., 
and p. Fifty samples were drawn at each parameter point and estimates computed by both 
the maximum likelihood and the adjusted estimation procedures. The means of the para- 
meters, based on N = 50 samples, for the two estimation procedures are presented in 
Table 4. The corresponding estimated variances, computed from the formula 

| a 

x (9, —9)? 

S3 = a (6-1) 
are given in Table 5. The symbol A is used to denote the number of crossovers per group of 
N = 50samples, while the subscript A is used to identify the estimated means and variances 
for the adjusted estimation procedure. The expected value of r;, E(r;), is given also. 

The properties of the estimates were investigated primarily at parameter points where the 
bias and variance of the estimates might be expected to be large. At first glance, a sample 
size of n = 100 may seem large but this view must be tempered by consideration of the test 
termination time 7’. As T approaches zero, the number of observed failures diminishes until, 
in an extreme case, no failures may be observed. In this latter case, obviously very little 
information can be obtained from the sample. The values of H(r;) are a better indication of 
the efficiency of estimation, although the efficiency of estimating f; is approximately 
proportional to H(r;) only when /; is very small. Hence one should consider the relative 
magnitude of a; and 7’, expressed as the ratio #; = «;/7', as well as the sample size, n, when 
making comparisons. 

An examination of Tables 4 and 5 reveals that, in general, estimation was poorest, both 
with respect to bias and with respect to variance, for those parameter points at which a large 
number of crossovers occurred. As would be expected the use of the adjusted procedure for 
the crossover cases brings about a substantial improvement. 

For parameter point number 6 for which /, = 0-2, 2, = 0-6, p = 0-2 and n = 100a total of 
209 samples was generated. The results are shown in histogram form in Fig. 2. 

The large sample variances of the estimates can be obtained by inverting the symmetric 
information matrix, J, where 
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514 Parameters of mixed exponentially distributed failure time distributions 
The asymptotic variances, indicated by the symbol 03, along with the estimated variances sug: 
obtained from empirical sampling are given in Table 6 for parameter points 3, 6, 14, 19 and asy! 
22. For parameter points 6 and 19 the agreement between empirical and asymptotic 1 
variances is remarkably good. In both cases /, and /, are relatively small. For points 3, 14 imp 
and 22 the empirical variances are much larger than the corresponding asymptotic variances var 
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suggesting that, for those combinations of £,, £,, and p, n was not sufficiently large for 
asymptotic conditions to hold. 

The effect of test termination time on the efficiency of the estimates is of considerable 
importance. Some light may be shed on this question by examination of the asymptotic 
variance-covariance matrix of 2,, 2, and #, even though it is undoubtedly true that for some 
regions of the parameter space the asymptotic results are quite different from the actual 
finite sample results. 

With time measured in its original units, ¢;;, the information matrix of @,, @, and $ may 
be shown to be 














PRD), 4) AEYMA=W)T? _ G(T)W(A-1)T) 
ay oe Oy ot PY 
qF,(T) G(T) ki—k)T 
Tay, 4,9) = 0 “a= — B oe ae , (6-3) 
1 
—[1-C 
L pg! j 
(1—k) G(T) T? kG,(T) T? k(1—k) G(T) 
where A = —— —, B= >, C= ——_—— . 
at F(T) a5 F(T) Pq 
The variance-covariance matrix of @,, 2, and p, IG », ») is 
qon—p-o) ADHD aT) M1 -#)T | 
1 pF (T) pgk(T) F(T) p(T) 
a» Ga, nD —— [1-A-C] -—— er 
: ghT) | ' gPT) 
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where D = 1—(A+B+C). 
The off-diagonal terms of IG, 2,, 5) approach zero as 7' approaches infinity, implying 
uncorrelated estimates of «,, %,, and p when the sample is uncensored. Also, 


lim A = lim B= lim C= 0 


T>o To T>o 


and, consequently, lim D = 1. Obviously F,(00) = F,(00) = 1. 


To > 
Therefore, lim 63, = ; (6-5) 
Tc np 
2 
lim o%, = =, (6-6) 
T>o " nq 
lim 03 = 2%. (6-7) 
T>o n 


From an intuitive point of view, the larger the value of 7’, the greater will be the amount 
of information on ,, «, and p. In fact, if the sample were not censored and hence 7' were 
infinitely large, a sufficient statistic for p would be the binomial estimate, p = r,/n, and its 
variance would be pq/n. It is therefore not surprising that, in the limit, 7% approaches pq/n. 

The behaviour of the ratio o%, to its limiting value, (pq/n), as a function of 7 is shown in 
Figs. 3a and 3b. In the first figure p = 0-05, in the second p = 0-30. Curves are shown for 
a, = 0-25 and a, = 1-0, with a, = 1-0 in all cases. When «, = a, the curves can be shown to 
be independent of p. 













































516 Parameters of mixed exponentially distributed failure time distributions 
The asymptotic variance of the maximum likelihood estimate of the parameter « in the sum 
case of a single exponential distribution, with sampling censored at time 7’, is wou. 
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the sum of the number of survivors from the two subpopulations, (n —1r), is known. Therefore, it 
would be reasonable to expect 
ai 





32 , : 

6-8) ai E(r;) (6 9) 

7 Noting that A, B, and C are always positive, it is obvious that 0%, is equal to the product of 

fe, a?/E(r;) and a coefficient which is always greater than or equal to one when D>0. Hence 
e 


when D > 0, the inequality (6-9) holds. D will be less than or equal to zero only when 7 is very 
small and this case is of little interest from a practical standpoint. 

Figs. 3a and 36 aiso show the ratio of o%, to its limiting value, o3/(nq). Again a, was 
assumed equal to 1-0 and only the «, = 0-25 and a, = 1-0 results were plotted. 

The curves in Figs. 3a and 3b could be used to help decide whether or not a given increase 
in termination time, 7’, will yield enough additional information to off-set the cost of the 
increased testing time. 


7. RELATION TO RESULTS FOR A SINGLE EXPONENTIALLY DISTRIBUTED POPULATION 


The estimating equations for the mixed exponentially distributed subpopulations are ob- 
viously a logical extension of the results for the case of a single exponentially distributed 
failure population. The maximum likelihood estimate of the parameter, a, of a single 
population is 


&= ~ (total observed life), (7-1) 


where the total observed life is t;+(n—r)T. (7-2) 


iM 


j 


The observed life for the mixed populations can be divided into three portions, namely 
rT Ts. 
Dty, Dey, and (n—r)T. 
j=1 j=1 


The first two quantities obviously can be allocated to their appropriate subpopulations. 
However, since we do not know how many of the (n—7) non-failures belong to each sub- 
population, we cannot be certain of properly dividing the observed life, represented by 
(n—r) T', between the two subpopulations. It is therefore necessary to estimate the portion 
to be assigned to each subpopulation. The expected number of the (7 —r) items belonging to 
subpopulation (1) is k(n—r). It therefore would seem reasonable to apportion k(n—r)T to 
subpopulation (1) and (1 — k) (n—r) T to subpopulation (2). Since the estimating equations, 
(4-11) and (4-12) can obviously be re-written as 


TY a 
Ltyt+kn—-r)T 
4, . (7-3) 


@ =? ma (7-4) 





we thus see that, just as in the case of the single population, our estimates can be expressed 
as ‘total life’ divided by number of observed failures. 


33 Biom. 45 
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8. RESULTS FOR A MORE GENERAL. MODEL 


Consider a mixture of s failure subpopulations, mixed in proportion p,:p.:...:p, where 


8 
0<p;<land > p; = 1. Unless otherwise indicated, in this section i ranges from i = 1 to 


i=1 
i = s and j ranges, as before, from j = 1 to 7 = r;. Assume that the subpopulations are 
distributed according to a distribution function F, (t: 0, a, ...,&,,) Which is independent of 


Ps Po +++, Pz. Then the cumulative distribution function is 
8 
Fi) = Sri hit. (8:1) 
= 


A random sample of n units is tested to time ¢ = 7’. The number of units, r;, belonging to 
the ith subpopulation and failing before time 7’, is recorded along with the actual failure 


times, t;;. Let the > r,; = r. Then (n—7r) units, which cannot be identified as to subpopulation, 
survive the test. ‘~! 

Assume that all measurements are in units of size T' and let # = t/T and £; = a,/T'. The 
conditional mixture proportion, defined in § 2, is 


La i F(x) x) 
p(x) = Giz) (8-2) 
where G(x)=1-—F(x), G(x)=1-F(x), O<p(x)<1, 
Zed)=1, m= 9A0), fl) = & 
The likelihood L is 
n! se 8 - Tr Ts Ts 
L= eal G(1) Jl Pi (i fils) I So(2;) vo UL Sal). (8-3) 
It follows that the first partial derivative of In L with respect to p; is 
olin d{nG(1)} 1% ft, 
—- = (n—-7r) —S—— +——-—-" 
Op; dp; Pi Ps 
= (n=n) [Be] Tete, (8-4) 
Pi Ps Pi Ps 
Setting the dln L/dp,; equal to zero and simplifying, 
p; = " Ene (n—r)k; : (85) 


n n 


The likelihood equations for #, are thus linear functions of the estimates of the corresponding 
conditional mixture proportions, k; = p,(1), regardless of the distributional form of F,(z). 

It would be desirable to consider a distribution function, F,(x), more general in form than 
the exponential in order to provide a model which will fit a larger set of failure populations. 
The frequent occurrence of exponentially distributed failure populations makes it desirable 
to choose a family of distributions for which the exponential would be a special case. The 
family of distributions represented by the Weibull function, 


F(x) = 1—exp[—(«/f,)"], (8°6) 
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and eee “ (5) exp [- (=)"| (8-7) 


is one possibility since F(x) is the exponential distribution when m; = 1. Assuming various 
values for the shape parameter, m,, provides a wide range of distributional shapes from 
which to choose. 
Let F ee 
(x)= > pF ;(x), 
i=1 
where F(x) = 1-—exp[—(2/f;)™] 


and the values of m; are assumed known. Taking the first partial derivative of In Z with 
respect to f; yields 


v% 
y Mi 
OML m(n—r)k, mr; mi 2 ts 
fp perm £6, pomtn * 
Setting the 0 In L/0f; equal to zero and simplifying, 


B: = me |( 3 aty-+(n—r) kin) . 


When s = 2 and m, = m, = 1, the estimating equations reduce to those obtained for the 
case of two exponentials. 

It should not be too difficult to set up an iterative method for the solution of the maximum 
likelihood equations, regardless of the shape parameters, m;, as long as the number of sub- 
populations, s, is small. When s = 2, the estiniating equations can be solved for /7", PZ" and 
p by the methods given in §4. In general, there will be (2s—1) equations to solve for the 
same number of parameters. The procedure used for s = 2 was to reduce the three equations, 
by substitution, to a single equation 








k = g(k), 


and then determine the value of k such that g(k) — k = 0. In the general case, it is possible to 
reduce the (2s— 1) equations te s— 1 equations of the form 


g(K)-k,;=0 (¢=1,2,...,8-1), 


where k; = p,(7') is the conditional mixture proportion for subpopulation (7) at time ¢ = 7’ 
and K = (ky, ky, ..., k,_1). The iteration method would then involve the selection of a vector K, 
the solution of the (s — 1) simultaneous equations. If a unique maximum likelihood solution 
exists, the solution will be a point in a restricted region within a unit cube in the (s—1) 
dimensional space of K since 


1 


O0<k;<1 and Eh, = a. 
i=1 


A digital computer would have little difficulty in locating the solution by trial and error 
when sis small. More efficient procedures for solving the equations by iterative methods are 
given in texts on numerical methods. Once K is determined, the estimates of /; and p; can 
be obtained from the original maximum likelihood equations. 


9. CONCLUSION 
The maximum likelihood estimation procedure appears to give satisfactory results when the 
sample size n is large and the test termination time, 7’, is large relative to «, and a. When n 
and 7’ are small, the estimates are badly biased and have large variances. It would thus 
33-2 
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seem desirable to investigate the use of other estimators having better small sample 
properties. 

In most experimental situations the relative magnitude of a, and a, will be known. When 
this is true it is possible to modify the estimating procedure in a simple way and thereby 
substantially reduce the bias and variances of the estimates. 


The authors wish to thank Mr J. Durbin and Mr A. Stuart for reading the manuscript 
and for their helpful suggestions. 
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A BIBLIOGRAPHY ON LIFE TESTING AND RELATED TOPICS 


By WILLIAM MENDENHALL 


Research Techniques Unit, London School of Economics and Institute of Statistics, 
North Carolina State College 


The following bibliography covers papers concerned with statistical theory and methods 
applicable to the study of the life characteristics of some biological or physical body. For 
instance, we may desire knowledge concerning the life characteristics of a certain type of 
electronic tube, a type of bacteria, or perhaps of a complex system such as a high-speed 
digital computer. In general, information is obtained either as the result of a planned 
laboratory experiment or through the analysis of data obtained from service use of the 
product. We will loosely describe either of these situations as a life test. In this bibliography 
the major emphasis will be given to the industrial applications of life and fatigue tests with 
related statistical theory, although many of the statistical methods developed for life tests 
of industrial products are applicable to studies of life in other fields. 

One might ask why statistical methods developed for analysing data from other fields of 
experimentation are not applicable in analysing the results of a life test. Why consider 
life testing as a special topic? If a random sample of items drawn from a population is 
tested until all items fail, conventional statistical techniques may be employed although, 
in many cases, the assumption that the underlying frequency distributions are normal 
(Gaussian) will not be satisfied. Failure of the data to satisfy the assumption of normality 
may present difficulties, although many statistical tests have been shown to be robust to 
the failure of the data to satisfy this assumption. A second difficulty encountered in using 
conventional experimental methods in industrial life tests is the expense and time involved 
in waiting for all of the test items to fail. For this reason, many industrial tests are con- 
cluded before all of the test items fail. A sample obtained from such a test is said to be 
censored. By censoring, a given amount of information can be obtained in a shorter time 
at the expense of testing more items. Testing time may also be reduced by the use of an 
accelerated life test in which the sample is subjected to stress conditions excessive to those 
encountered in its normal environment and which reduce the life of the product. Careful 
justification of accelerated life tests requires the experimenter to establish the relation 
between life characteristics under accelerated and normal stress conditions. Maximum 
utilization of test equipment and time saving may also be obtained by the replacement, or 
renewal, of a test item by a new item immediately upon failure. An additional and 
advantageous characteristic of life test data is that the observations are ordered in time 
and thus permit the use of order statistics. That is, the smallest observation is the first 
observed, the second smallest, the next, etc. Therefore, we shall be concerned with 
statistical theory which applies to non-normal as well as to normal distributions, censored 
sampling and sampling with renewal, order statistics, and the treatment of data obtained 
under accelerated conditions. Very little has been published in connexion with the last 
topic. Other topics of considerable interest in the study of life characteristics are extreme 
value theory, fatigue and wear tests, systems reliability, machine productivity problems, 
and the treatment of sensitivity data. 
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The references have been classified as belonging to one or more of nine subject groups. 
They are presented in alphabetical order by author with the appropriate subject classifica- 
tion indicated by capital letters in parentheses at the end of the reference. Some of the 
subject classifications are obviously subtopics of others. For instance, an extreme value is 
an order statistic, but the theory of extreme values is of sufficient importance to warrant 
a separate classification. Similar overlapping occurs with censored sampling, since many 
of the methods used in the treatment of censored samples are based on order statistics. 
These references will be found under censored sampling and will not be repeated under order 
statistics. The choice of papers was governed by their relevancy to the possibility of appli- 
cation to the study of life characteristics. Some of the groups are represented only by a 
few of the more important papers whose bibliographies may be consulted for additional 
references. In general, the main body of Continental literature is represented only by 
papers revealed in the bibliographies of other papers. Papers which do not fall specifically 
into one of the nine groups are unclassified. 

The nine subject groups are as follows: 


€. Censored sampling or sampling from a truncated distribution, both univariate and 
multivariate. 
Sampling with renewal. 
Order statistics. (Excluding papers on extreme value theory or censored sampling.) 
Extreme value theory. 
Papers concerned with failure rates and conditional failure density. Also included 


ARO x 


are tests of randomness for a sequence of events occurring in time. 

F. Fatigue testing and wear problems. 

M. Machine productivity problems. This group includes some representative papers con- 
cerned with machine productivity of a group of machines subject to various failure 
laws and servicing arrangements. 

S. System reliability. 

D. Methods applicable to sensitivity data. These papers primarily deal with the fitting 

of dosage-mortality curves although some refer explicitly to the use of these methods 

in the testing of explosives, etc. 


For more extensive bibliographies on order statistics, the reader is referred to ‘Order 
Statistics’ (1948, by S. 8. Wilks, Bull. Amer. Math. Soc. 54, 6) and ‘Bibliography of 
Nonparametric Statistics and Related Topics’ by I. R. Savage (1953, J. Amer. Statist. 
Ass. 48, 844). Papers concerned with machine productivity and the servicing of automatic 
machines are included in ‘A bibliography on the theory of queues’ (1957, by A. Doig, 
Biometrika, 44, 490). Two bibliographies on system reliability which contain both statistical 
and non-statistical papers are ‘A Summary of Reliability Literature’ (1956, by C. G. 
Moore, Jr., Naval Electronics Service Unit, Washington, D.C.) and ‘Literature Guide on 
| Saree Control and Reliability’ (1956, by W. F. Luebbert; Technical Report No. 13, 
| Stanford Electronic Laboratories, Stanford University). 


The author wishes to thank Dr W. R. Buckland, Dr D. R. Cox, and Dr G. R. Herd, as 
well as other persons for helpful comments and bibliographical material. In particular, 
thanks are due to Prof. M. G. Kendall, Director of the Research Techniques Unit, London 
School of Economics, for direction and support of this work. 
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MISCELLANEA 


A two sample distribution free test for comparing variances 


By BALKRISHNA V. SUKHATME 
Indian Council of Agricultural Research, New Delhi 


Introduction. Two sample tests of a non-parametric nature have been proposed by various authors for 
the problem of testing the equality of variances, particularly in papers by Mood (1954), Lehmann (1951) 
and Sukhatme (1957). They discuss the consistency and power properties of these tests. It has been 
shown by Sukhatme (1957) that some of these tests are reasonably efficient for normal alternatives and 
highly efficient for some non-normal alternatives. 

These non-parametric tests are, however, of limited application in the sense that they presuppose 
knowledge about the relative location of the two populations which is not always available. In the latter 
case, the test can be modified by applying it to the deviations of the observations from the sample 
medians rather than to the observations themselves. The modified test is essentially the same test and 
we would expect it to behave similar to the original test at least for large samples. On examination it was 
found (Sukhatme, 1958) that this property is not shared by Mood’s test which was thought to be a good 
competitor to the variance ratio F test. A new test satisfying the above property was therefore proposed 
by the author (Sukhatme, 1957). The test is based on the statistic 


m n 


ae »y 4 K(X;, Y;), 


~ mn < j= 


where K(X,Y)=1 if either O0<X<Y, or Y<X<0 


= 0 otherwise. 


It is seen that the test statistic is based on sets of two observations, one from each of the two samples. 

In this paper, we propose a new non-parametric test for comparing variances and derive a general 
formula for its asymptotic relative efficiency with respect to the variance ratio F test for scalar alterna- 
tives but almost arbitrary continuous distributions. It will be seen that considerable improvement in 
power is attained if additional information obtained by considering sets of three observations is also 
taken into account. The test will be shown to be consistent. This test also presupposes knowledge about 
the relative location of the two populations. It will be shown that under certain regularity conditions, 
the proposed test after modification is asymptotically distribution free. 


1. The proposed S test. Let X,, Xq,...,X,, and Y,, Y,..., Y, be two samples of independent observa- 
tions drawn from populations with cumulative distribution functions F(x) and G(x), respectively. No 
knowledge is assumed concerning F' and G except that they are absolutely continuous and that they 
differ in the scale parameter only. The test statistic for testing the hypothesis H: F = G@ against the 
alternative A: +G, may then be defined as 


n m 
S= > = Q(X;, Xe» Y¥)+2> > Q(X;, ¥,,¥i)+(4s—1) & Y K(X;, Y,), (1-1) 
i=1j+k i+pj=1 i=1j=1 
where s=m+n, 


K(u,v)=1 if either O<u<v, or v<u<0 
= 0 otherwise 


and Q(u,v,w)=1 if either 0<u<w, O<v<w or w<u<0, w<v<0 


0 otherwise. 


We reject the hypothesis if S is either too large or too small. 
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An alternative expression for the test statistic 


Let rj be the rank of the ith positive observation on Y in order of magnitude in the combined sample 
of positive observations on X and Y, and 77 the rank of the ith negative observation on Y in order of 
magnitude in the combined sample of negative observations on X and Y. Further, let 

n’ = number of positive observations on Y, n” =n—n’, 
m’ = number of positive observations on X, m” = m—m’, 


’=m +n, 8” =m’ +n", s=mtn. 





Then it can be shown that S=8,+8,, (1-2) 
n’ —8) ~ “(n’ + 1) (38 + 4n’ — 22 
where S; = by tana > han (n + )( s+ n ) 
i=1 2 i=1 12 
and Spa Sep EMMA) Ey mimeo Qn 4-29" —A) | n(n" + 1) (Bo + Bn" — 14), 
i=1 2 i=1 2 12 


This expression for the statistic S seems to be more convenient. In the subsequent development, we will 
use either of the two expressions as the case may be. 


2. Expectation and variance of S under the hypothesis. 
ES = ES,+ ES, 








= ELE[S, | n’,m’]]+ ELE[S, | x”, mJ], (2-1) 
var S = E[var[S, | n’,m’]]+ E[var [S, | n”,m”]]+ var [E[S | n’, m’]). (2-2) 
For fixed n’ and m’, we have 
n’ 4 of 1 n’ To’ 1 2. , 1 4 
ge, Bee 
i=1 2 i=1 6 
E 5 r3 = n'e'(s" ah i E 5 rt = ni(s’ + 1) (68 +003 +e'—1) 
% p 1 1 a 2 " 1 1?{ _— 
n , a f , n’ 1 es , a 
BE = MOVIL) Seay, ee D9 +1 
i<j 12 ij 6 
E > ririd = n’(n’ — 1) (8’ + 1) (28’ + 1) (108”? + 78’ — 6) 
a 180 
Also n’ and m’ are binomial random variables with probability of success equal to half so that 
En’ =}3n, En? =}(n?4+n), En’ = 4(n?+3n’), | (2-4) 
En’ = #g(n* + 6n3 + 38n?2—-2n), En’ = 33(n§+ 10n* + 15n3 — 10n?). J 
Using these results, we obtain 
ES = 3n[5s* — 3sn — 2n? — 12s + 12n], (2-5) 
var S = ‘Ce 5p [61s* + 3315" — 120n* + 480ns — 23448 — 600n + 2636}, (2:6) 
x 


3. Expectation and variance of S under the alternative. Consider the total number of positive observa- 
tions only. Since 77 is the rank of the ith observation on Y in order of magnitude in the combined sample 
of positive observations, we have at 


ry =it+ D K(XpY), ti 
t=1 


where X,,X>,...,Xm, is a random sample of m’ observations from the truncated population having 
c.D.¥. F(x) = 2F(x)—1land Y,, Y,..., Y,,, is an ordered sample drawn from the truncated population 
having the c.p.¥. G’(x) = 2G(x) — 1. This is the basic relation that will be used in computing the moments 
of S. We will illustrate the method by computing 


sa 
z| > al Ce m’|. 
i=1 
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We have on squaring and summing over ¢ 
n’ n’ nr’ m’ 

2 (r,-7)? = >» y K(X, Y4)+ XL > K(X, Y;) K(Xq Y;)- 
i=1t+s 


Taking expectation ea _— sites ra 
E ; (r¢—t)? = m’ 7 P{X,< Yj}+m’'(m’—1) s P{X,< Y;,X,< Y;} 
z co “" 
=m 2 } 0 TOE = (n’— eis ere 


+m’(m’—1) yay * pr (u Te nwo pil’ (u)]*“[1 — @’(u)]" -*d@’(u 


oO 
= mn’ { F’ dG’ +n’m'(m’—1) i F’ d@’. 
0 0 


Also 
n’ n’ m’ n’ 
EY (ri- as  K(Xp Y, =m D> iP{X,< Y3 = m’n'(n’ -»[ F’G’ d@’ +m’n’ i F’ d@’. 
i=1 =1 t= i=1 
Hence 
n’ n’ n’ n’ io) oo 
ED r2=L£ YD (r -1)? +22 Y i(ri-)+ YFV= ann’ | Pra’ + m’n'on’—1) { F”’ dq’ 
i=1 i=1 i=1 i=1 0 0 
oc ‘ , 1 2 , 1 
tint [° Pearaar OED AD, 
0 
Let Mi; = [ wrentemyacr, (3-2) 
where the range of integration is from 0 to oo. Then for fixed n’, m’, we have 
ad 
EY = W'(n’+1)4+ mn’ Mo, (3-8) 
i=1 
“es 
E > vr? = 3m’n’ Mio + mn’ Mao + 2m’n’O Mj, + 4n'(n’ + 1) (2n’ +1), (3-4) 
i=1 


n’ n’ 
E> r3 = Tm'n’ Mio + 6m’On’ Moo + m’®n’ Mo + 12m’n’ M4, + 3m’ n’®) Mg, + 8m’n’O M2 + } a, 
i i=1 


; (3:5) 
n 
E > rit = 15m’n’ Mig + 25m’? n’ Mg + 10m’On’ M9 + mn’ M go 
i=1 
+ 50m’n’®)M 4, + 30m’? n’® Ma, + 4m’On’®) M3, 
+ 30m’n’™ Mio aa 6m’? 'n’®) Moo oa 4m’n’ Mi + 4 “4, (3-6) 
Gat 
as 
EY rirs = m’n’°(3M jo — 2M41) + m’On’@4LM 5 + m’n’O4M jg + z 7) (3-7) 
— i<j 
E > rer, = m’n’?(18M jo — 12M 44) + mn’? 4M 99 — 3M), + 5M%i] 
i+j 
+m'On!M jo Mig + m'n Mj, + $M io] +m'n'®[5Mio— 6Min + 8Min] 
me 
+m’n/82M3, Min + 4Mio]+ Dd 7, (3-8) 
i+j 


ad 
EY rir}? = m’n’®27M io — 22Mi1] + m’n’®[22.M jo — 30M io + 32M ix] 


i<j 


+2m’On’®) Myy Mi, + m’n[ 9M, + Mio — 4M] 
+ m'2n 222. M2 + 16M 99 — 15M gy] + m’2n’[ 2M 33 + 4M a9) +m’ n"4M3 


4+ man's 3M iy Mio+6 fre ace) {* F”(u) acu) | +m’n"%My, + > ay? 
0 i<j 


x 
+m '9l $y —3Mia + OMig Min +12 i) F’(x) d@’(x) J F’(u) arwyaar |. 
0 
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Replacing prime by double prime, we obtain the corresponding results for functions involving rj, where 
F"(x) = 2F(x) and G’(x) = 2G(z), the range of integration being from —o to 0. Using these results we 
have 
ES = ¥mn{( — Mio + 6M io — 4 — 2M fi, — 2M 9 — M39 — 2M 31) + m( M9 — 3M fo + Mao + Min + 2) 
+n(2Mii —3Mi,+ 2Mii+Miot+2)}, (3-10) 
var [S, | n’,m’] x m4n’[ Mo — M25) 


zx 
+ m'n| 4M —8Mio.Mi,—4M22 +12 i F(x) dG@’(z) | F’(u) aan) | 
0 


zx 
+ mrnl atin —16MZ —8M3.Mi + 24 [ 2a) acre) | F’(u) G’(u) aan) | 
0 


+m’n'*[4M 4, — 4Mi — 4Mj5] + 82m’2n’} [M39 — M32] + s2m’n?}[2Mj)—M2—2Mi,) 
+am'n’ 2M + 2M4o— 2M 49 Mig — 4M, Mio] +8m’*n'[ Mio — Mig Mio] 
+em'n’[ Mig — 3M is + 2Mi, — 2M‘, Mio), (3-11) 
var[S, | n”,m”] = m”4n”[_M go — Mg + 4M 2 — 4M 75 — 4M 3 + 4M io M2] 
+m’n"[ — 12M", —4M%,—4M72 +4M%,+12M%,+8M%, M%,—4M72] 


zx 
+m”? [ 4305 —4Ms?2 +12 | F”(x) dQ’ (x) | F”*(u) d@”(u) 
—o 
— 8M Mi, + 8Mio— 8M7, — 20M75 + 12M{,M% + 16M, Mi, — 1M | 


xz 
+m"2n”2 [ —16M%? 424 fre dG” (x) i) F"(u) G” (u) d@”(u) 
«o 


+ 12M — 8M M7, — 24M7, — 20M7§ + 12M, + 24M7, Mi, + 8M 4% Mie | 
+82m’”2n"}[M 30 — M72] + s*m’n”*4[2M{)— 2M7, — M73) 
+8m”n”"[4Mfo— 4M71, — 6M{j5 +4Mi, Miy + 2Miy Mo] 
+sm’n"[3M%y — 6M%, —2M72 + 3M7, + 2M%, Mio] + sm”2n”[2M 39 — 2M15 — M39 + M%p Mi), 
(3-12) 
var [E[S | n’,m’]] x dymn{m3[(I + D)?+ 8P(1+ D+ 2P)]+n3{(1+ £)?+ 8P(1+ E+ 2P)] 
+mn*[3I? + 4H? + 48P?+4DI+16DB+ 6EI—-16AH 4+ 241P+4DE+32EP+8DP] 
+mn[3I? + 4D? + 48P?+ 421+ 16HA+4+6DI—-16BD+24IP+4DE+32DP+8EP}}, (3-13) 
where A = My—2M%+1, B=2M%y,—2M%+1, D= Mo—M+2Mi—-1 
E = 2M\,-—2M{,+2Mjo-1, [T= Mio—Miot+1, P=3{Mio-—1). 
4, Consistency of the S test. We observe that S is a sum of several generalized U statistics. Hence, using 
Lehmann’s (unpublished) result on the asymptotic normality of generalized U statistics, it follows that S 


is asymptotically normally distributed both under the hypothesis and the alternative. 
Let the critical region under the null hypothesis consist of those S’s satisfying 


ES-—S2t,0, 
where lim ¢, =t. 
n—>o 


Then P{ES—S>t,0| A} = P{L,S—S>ko,}, where k= (t,0+H,S—ES)/o,. 
Using Tchebycheff’s inequality and the expressions for E,S and o, given above, it follows that 
lim P{HS—S>t,0|A}=1 


m,n->o 





which is the requirement for consistency. 
5. Asymptotic efficiency of the S test. Putting F(x) = G(dz) in (3-1) and proceeding as in Mood (1954), 


we have 0 


foo) 
= mn(s— 2) [2f xG(a) g?(x) dx -{ xg?(x) ae| , 
é=1 —o -—o 
Efficacy of the S test is therefore 


180 16mn(s— 2)? [2f xG(2x) g%(2) ae— | ag?(x) ax] (5°1) 








61 83 


Pixar 
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Also the efficacy of the variance ratio F test is 
4mn 
(m+n) (B,—1)’ 


| (x — Ex)* dF (2x) 


fz = " 
[ fee — Ex)? ara) | 


Hence, the asymptotic relative efficiency of the S test with respect to the variance ratio F test is given by 


(5-2) 


where 





720 «o 0 2 , 
Cs,7 = 1 (22-1) [2 xG(x) g?(x) ae— |" xg*(x) ac | : (5:3) 


From the formula (5-3), it is obvious that depending on g(x), 0<esg »<00. In particular if g(x) is the 
standard normal density function, eg » = 0-69. If g(x) = 1, -}<a<} theneg , = 0-80. From Sukhatme 
(1957), the asymptotic relative efficiency of the 7' test with respect to F test is given by 


"oD 0 
er, p = 12(f,—1) LI. xg?(x) ae— | agrta)ae |. (5-4) 


It follows that the asymptotic relative efficiency of S test with respect to T test is given by 


oe) 0 2 
60 2| xG(x) g?(x) ae— | axg?(x) dx 
ee 61l —* —" = ____ (55) 


[faa dx— | axg?(x) dx 
0 —c 


From the above discussion it is seen that the test is reasonably efficient for normal alternatives and 
highly efficient for some non-normal alternatives. The test however presupposes knowledge about the 
location of the two populations which is not always the case. We therefore modify the test by applying it 
to the deviations of the observations from the sample medians rather than to the observations themselves. 
The modified test statistic may be defined as 


3 nm nm 
S= Dd Dd OX,-X,X,-X,Y,-¥%)+2Dd d O(X,;-X, Y,-Y,Y,-Y) 
i=1 54k i+pj=1 
8 n m 
+(5-1) x pe Re BoFi— F (5-6) 
(=ti= 
where X and Y are the sample medians. : 

We observe that S is a sum of several modified generalized U statistics as defined in Sukhatme (1958). 
The necessary and sufficient conditions for a modified generalized U statistic to have the same asymp- 
totic normal distribution as the original generalized U statistic are stated in a theorem derived by 
Sukhatme (1958). These conditions are satisfied in the present case. It follows that S and S have the 
same asymptotic normal distribution. Hence the test based on S is asymptotically distribution free. 

Summary. A two-sample distribution free test has been proposed for comparing variances, and 
a general formula derived for its asymptotic efficiency with respect to the variance ratio F' test. The test 
presupposes knowledge about the relative location of the two populations. In case this is not available, 
the test can be suitably modified, and it has been shown that under certain regularity conditions the 
modified test is asymptotically distribution free. 
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The mean difference and the mean deviation of some discontinuous distributions 


By T. A. RAMASUBBAN 
London School of Economics and Political Science 


INTRODUCTION 


Although much attention has been paid to the study of the mean difference and the mean deviation of 
the normal distribution, very little seems to be known about these statistics in respect of some of the 
important discontinuous distributions like the hypergeometric, the binomial and the Poisson, etc. This 
paper is an attempt towards this direction. 

The usual definitions of the mean difference A and the mean deviation 6 (about the mean uz) may 
be considered as particular cases of more generalized ones which can be defined as follows: 


A, = DD i-J |) A) (1-1) 
ee 


and 6, = | t-p|" h(i), (1-2) 


i 


where the summations are with respect to all the admissible discrete values of i andj and h(t) refers to the 
discontinuous frequency function at 7 or j equal to t. Further, associated with A, and 4,, let 


A, 
De = Oe (1-3) 
6, 
and G, — (jt) (1-4) 


where //, is the variance of the distribution. It will be noted that the ratio G, is an extension of the 
Geary’s ratio which is equal to G. 

In this paper I shall confine my attention to A, and 6,, together with the corresponding ratios D, and 
G,. Ina later paper I shall deal with A,,, A,,,, and 6,,,, for r>0. Consideration of 6,, is unnecessary, 
since it is quite obvious that 4,, = f2,, the 2rth moment about the mean. 


2. MEAN DIFFERENCE (Aj) 
With A, defined in (1-1) 
A, = DDI i-J| AHA) 
ij 


= 22 (iJ) Mi) MG), (2-1) 
ij= 
since YD (é—J) h(i) h(7) = 0. 
ij 


Using expression (2-1), I now proceed to find A, for the binomial, the negative binomial, the Poisson, 
the logarithmic and the geometric distributions. 


(i) The binomial distribution 


The frequency function of this distribution is 


h(i) = (7) pig’? (§=0,1,...,n) (2-2) 
with q+p= 1. 
The mean difference is therefore given by 
i 


A, = 2>) YC —j) (;) (‘))o'0-tpn 


i=0j=0 


= 2% (;) a{i(}) q"+(¢—1) (") pq t+...+ Oo :) pig) ‘ (2-3) 


35 Biom. 45 
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Summing now over i, term by term, 


=a (o)(t)ee aCe) (2) {()(2) +900) (sre Pl 
tot 00" (ita) #40) (:2) +025) (:3) +0) (25) 
HCG) (ofa) +9602) (eos) #96022) (sta) +424 (0) (sea) 

wana tot (,2a)(s)ema) 


~S(n-1 =< 2i-1,2n—2i-1 *% 2ig2n—2i-2 9 
A, =2npq) DI. _, ete + 5 (" “p q (2-4) 
i=1 \?— -. i=0 i 


= 2npq(A +B), say. 


prg?"- 2i 


It is then not difficult to show that A is the coefficient of t-! in {(q + pt) (q+ pt—)}"—, i.e. the coefficient 
of t"-? in {t+ pq(1—t)?}"—1 and hence that 


LCN) m(NCT oor 
B= (1) ("7 oat (3) (")') ee... (2:6) 


Substitution of (2-5) and (2-6) in (2-4) yields, after some further reduction 


n—1 n—1 2% 2% 
_ 7s 1. %¥ igi = 
erat” m( i Joe ((7) (:-)] 


Likewise, 








= 2npa 3; (-(") See (2-7) 
= 20'S, (- ng PN :) (7) (pq). (2:8) 
If we expand (2-7) and rearrange, we also find 
A, = npg [ — 4) PE CONOR?) egg COO | 
= 2npq oF, (—n +1, 4; 2; 4p), (2:9) 
where oF’, («, 2; 3 x) = 1+ +k Maa x + 


From the computational point of view, however, relation (2-4) is to be preferred, since the numerical 
N 

values of the terms like “ ) p'q’-* which make up this relation are available for different N and i in 
a 


Biometrika Tables for Statisticians, 1 (1954 edition). Table 1 thus tabulated below for some typical values 
of n and p gives an idea of the behaviour of A, for both small and large values of n and p. 


(ii) The negative binomial distribution 
For this distribution 
E (n+i—1)! vy t 
h(s) = q-* — = 0,1; 2,... 2-10 
MO) =" ini)! (; ale ) ali 
where q—p = l andn>0. 


N(2)*4{0)(i)}e 
| 





as 


p*q**— 


cient 


(2-5) 


(2-6) 


(2-7) 


(2-8) 


(2-9) 


srical 
di in 


alues 


2-10) 
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Table 1. Values of A, for the binomial distribution 



































Pp 0-1 0-2 0-3 0-4 | 0-5 
~ 
6 0-7343 1-:0534 1-2296 1-3236 1-3535 
1l 1-0584 1-4598 1-6879 1-8109 | 1-8501 
16 1-3028 1-7748 2-0459 2-1925 2-2392 
21 1-3956 2-0418 2-3500 2-5169 2-5701 
26 1-6870 2-2777 2-6191 2-8041 | 2-8631 
31 1-8492 | 2-4913 2-8629 3-0641 3-1287 
Table 2. Values of Dy = Ay/\/fLo 
d 0-1 | 0-2 0-3 | 0-4 0-5 
n | | | | 
a —= =" ——————EE 
6 0-9992 1-0752 1-0954 1-1030 1-1051 
11 1:0637 1-1003 1-1105 1-1145 1-1156 
16 1-0857 1-1093 1/1161 1-:1189 1-1196 
21 1-:0151 1-1139 1-1191 1-1211 1-1217 
26 1-1028 1-1167 1-1209 1-1225 1-1230 | 
31 1-1071 1-1186 1-1221 1-1233 1-1239 | 
| 























Values of D, = A,/,/. where 4, = npq are given in Table 2 for the same n and p as in Table 1. They 
will be seen to approach the limiting value for the normal distribution viz. 2/,/7 = 1-1284 most rapidly 
as n increases when p is in the neighbourhood of 0-5. 


Omitting the algebraic details which are exactly similar to those for the positive binomial distribution, 
it may be shown that 





n+t (27)! : 
A, = 2 —1l) —_ 4 $ 2-11 
1 npg | r( ; Vat 1 (P@) (2-11) 
= 2npq oF (n+ 1,4; 2; —4pq). (2-12) 


It will be seen that (2-12) is obtained by changing the sign of n and p in (2-9) demonstrating the same 
equivalence as holds for the moments of these two distributions. 


(iii) The Poisson distribution 
The density function being given by 





sit 
h(i) = — (i = 0,1,2,...), (2-13) 
Ain 
A, = 2e-24 > i—j)—~ 
33 ag Daj 
i-1 i ji) oO A i ji-1 
= 2A e-2A a =| )] 
hare i—1)! (Zi Pr ! 0-1 
i =n § ; “(3% Ss *)] 
=0t! \j=03! g=07! 
Ai o Azi+l ] 


= 2\e-2A ig ee 
' Saat iit! 


(2-14) 
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This also follows by proceeding to the limit in (2-4). By considering the limit in (2-8), we have an in- 
teresting alternative form for A, viz. 





oo {2 Ai+ 
Forms (2-14) and (2-15) establish an identity 
Az A2i+1 pay Ait 
—2A —1) ° 
Ae Sat Ga! aaa |= x 1) (| ) ae (2-16) 


If I,,(x) (n) is the modified Bessel function of the first kind defined by 


2 oo (dar) "+26 ‘ 
sith Zinta om 
we find for the Poisson distribution, from (2-14) 
A, = 2Ae-*A {I,(2A) + I,(2A)}. (2-18) 
; oO P % Ait 
df 2i\ A 
f 28 -m(*) 
and hence from (2-16) and (2-17) 
a _ 4 a) ,-0a 
“A aA [2A e-#A{T,(2A) + 1,(2A)}] 
= 2e-#4J,(2A), 
giving 
A 
f(A) = 2| e-®AT (2A) dA 
0 
A 
and a= 2 | e~®AJ,(2A) da. (2-19) 
0 


Still another form for A, is possible. Consider expression (2-15). This can be slightly adjusted to give 


A, = 2A,F,(4; 2; —4A). (2-20) 
Again, from (2-19) and (2-20) 


A 
i) e-2AJ,(2A) dA = A, F, (4; 2; — 4A). (2-21) 
0 


Table 3 shows the values of A, and D, for different A. A, has been obtained using expression (2-18). 
The values of J,(2A) and I,(2A) have been taken from the British Association Mathematical Tables, 10, 
Bessel Functions-——Part IT. 


Table 3. Values of A, and D, for the Poisson distribution 
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It will be seen that while differing considerably for small values of A, D, tends to the normal value of 
1-1284 as A increases. 
(iv) The logarithmic distribution 
For the logarithmic distribution, 


Mi) = O~5 (¢ = 1,2, ...) (2-22) 
where C = —I/log,(1—a) anda<1l. 
Thus 
a ae cet i 
A, =20 Xli- 


4° fog aya 7° 


which reduces to 


A, = 2/{log(1—«)}?(A —B), (2-23) 
where 
A= % «(> 5). (2-24) 
i=1 j=1 J 
0 yi a ; 
B=> = (3 2’). (2-25) 
i=1 % \j=1 


Term by term summation by 7 gives 


1 a 
A =——{-l l-a?)}}, B= ——]1 1 % 
og {— log (1 —)} Tag oa (i+) 


Substituting these values in (2-23), 
A, = 2/(1—«) {log (1 —«)}* [ — {log (1 — a*) (1 + a)*}] (2-26) 


A, and D, for various values of «’s in the range 0-1 (0-1) 0-9 are tabulated in Table 4. 


Table 4. A, and D, for the logarithmic distribution 





a 0-1 0-2 0-3 0-4 0-6 0-7 0-8 0-9 











0-9784 | 1-3885 2-1288 4-0856 











D, 0-4341 0-5756 0-6712 0-7395 








| 
| 
A, 0-1061 0-2194 0-3506 0-5080 | 0-6958 
| 
| 
| 























0-7776 | 0-8231 0-8444 0-8515 0-8373 





An interesting feature can be observed in the above Table viz. that D, increases with « up to « = 0-8 
and decreases at « = 0-9 suggesting a maximum value for D, in the range 0:8<a<0-9. 


(v) The geometric distribution 
In this case, 
h(t) =qp* (¢=0,1,2,...), (2-27) 
where g+p = 1. 
By a straightforward simplification, 





2 
A, = of? (1+ 2p) _ P |=. (2-28) 
@(l+p)? gQ(l+p)*) 1-p? 
Also D,= & = _2p 1-p i 2VP (2-29) 
Vl, 1—p* Jp +p 
so that lim D, = 1. 
pl 


Using relations (2-28) and (2-29), A, and D, have been computed for different p’s and are given in 
Table 5. 
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Table 5. Values of A, and D, for the geometric distribution 


























| 
Pp A | D, Pp A D, 
0-02 0-0406 0-2773 0-40 0-9524 0-9035 
0-04 0-0801 0-3846 0-50 1-3333 0-9428 
0-06 0-1204 0-4622 0-60 1-8750 0-9682 
0-08 0-1610 0-5238 0-70 2-7451 0-9843 
0-10 0-2020 0-5750 0-80 4-4444 0-9938 
0-20 0-4167 07454 0-90 9-4737 0-9986 
0-30 0-6593 0-8427 0-98 | 49-4949 0-9999 








The approach of D, to the limiting value of unity as p > 1 is very clear from the above Table. 


3. MEAN DEVIATION (6;) 
From (1-2) 
8, = Dl i-2| Ai) = 2D (i-w) h), (3-1) 
v wm 
where m is the smallest integer greater than y if ~ is not an integer and equal to yp if it is itself an integer. 
Johnson (1957) has recently obtained the value of 6, (and G,) of the binomial distribution. I have 


therefore considered the other distributions, viz. the hypergeometric, the negative binomial, the Poisson, 
the logarithmic and the geometric distributions. 


(i) The hypergeometric distribution 
The frequency function of this distribution is 


CF) lo~3) 


oe Tt) ae (3-2) 


(7) 


6, = 2) (i—np) h(i). 
™ 


h(i) = 





and the mean “4 = np. 
Therefore from (3-1) 


The above sum can be easily evaluated, by replacing (7 —np) by 
(1/N) {(Nq—n +1) —(n—1) (Np—1)} 
and h(i) as defined in (3-2) by 
") _ (Npy(Nay 
i) NM (Nq—n +i)’ 


where a” = a(a—1)(a—2)...(a—r+1). 














s Qn. ; _ [n\ (Np) (Nagy 

Thus é6.= yuiiNa-n+ i) —(n—%) (Np—i)} (3) ame eave 
_ 2(NgyQ~e ") (Np) i(Nq—n +7) ny eee? 
=N No + (; (Nq—-n+io (’ (Nq—n+ao 
_ 2(Ngmr 2 “oe (Np n wy (Np) i+» 
“N No [mE (77, (Nq—n+i—1)—» nd( i) (Nq—n+i 
_ 2n(Nq)™ (n—1 ) (Np 
~N Nw \m-1) (Nq—n+m-— 1)» 


yy Nq ) 
= jiu m ad (3°3) 





N % 


(, 











(3-1) 


eger. 
have 
sson, 


(3-2) 
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It is readily verified that as N — oo, this tends to the value for the binomial distribution. 
(3-3) has a very useful property which may simplify the numerical computation of 6,. Suppose n and 
n’ are two integers such that n+’ = N and further let m = np and m’ = np so that np and n’p are also 


integers. Then 
= ) Se 
2m’(Nq—n’ +m’) \ m’ ] \n’—m’ 


‘3 (r) 


Np ) Nq ) 
ae Np—np N-—n—Np+np 


6, (corresponding to n’) = 








i (was) 





From (3-3), 


a { Nq ) 
anpiava—n-+np)\ m)(n—m 
N ~ : 
(1). 
at Nq ) 
ra 2(N —n) m)]}\n-—m 
=—y 14 : 
(, 
Thus 6, (corresponding to n’) = 6, (corresponding to n). 


Moreover, the variance #4. = npq(.N —n)/(N — 1) remains unaltered by substituting n’ for n. 
Thus, the values of G, corresponding to n and n’ are equal. 


6, (corresponding to n) = 





since m = np. 





(ii) The negative binomial distribution 
For the negative binomial (2-10), 7 = np. Hence 


io2) oo 
6, = 2) (¢—np) h(t) = 2D {iq—(n +4) p} h(i) 
m m 
which on simplification gives 





oe m 
4 = 2m ("7 ) a (3-4) 
m — 
(iii) The Poisson distribution 
eo 
With pw = A, 6, = 2) (¢—A) A(t) where h(7) is given by (2-13). This reduces to 
m 
—A,m 
3, = i. (3-5) 
m! 


The result for the binomial distribution may easily be shown to tend to this limit as n > 00, p > 0 and 
np >A. 


(iv) The logarithmic distribution 
The frequency function is given by (2-22) and the mean yw = ca/(1—a). Thus 


ba Ca 
6, =2 i- =) mi. 
. > ( l-a 
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This may readily be shown to reduce to 
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2ca 
6, = Tog Sm — (1—am)}, (36) 
m—1 gi 
where B=) 2 r (for m> 2) 
1 
=0 (form = 1). 


(v) The geometric distribution 
The mean of the distribution (2-27) is p/q. 


a= 25(i-?) h(i) = 2mp™ (3-7) 
m qd 


Dr Johnson, who was kind enough to read this section on the mean deviation, has pointed out a re- 
markable general result which includes the hypergeometric, the binomial, the Poisson and the geometric 
distributions, viz. 

6, = 2(variance) (frequency function at m), (3-8) 


where m is the greatest integer not greater than the mean. He also notes that for continuous distributions, 
if m=, the mean, the relation holds for the normal curve and for a Pearson Type III distribution, but 
fails to hold for the double exponential and for a Pearson Type VII distribution. 

It is hoped to deal with some points arising out of Dr Johnson’s suggestion in more detail later. 


In conclusion, I should like to express my thanks to Prof. M. G. Kendall for his helpful advice and 
encouragement. 
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The mean deviation of the Poisson distribution 


By EDWIN L. CROW 
National Bureau of Standards, Boulder, Colorado 


The recent derivation of the mean deviation of the binomial distribution by Johnson (1957) suggests the 
corresponding derivation for the Poisson distribution. This derivation, in some contrast to that for the 
binomial distribution, is a simple exercise, but the result appears to be new and yields some interesting 
comparisons with the mean deviations of the normal and binomial distributions. 

Let the mean of the Poisson distribution be denoted by m, the largest integer not exceeding m by [m], 
and the mean absolute deviation (from the mean) by MD. Then 














ed e-* mr 
MD = X|r—m| 
r=0 r! 
[m] mr’ cs) mr? 
aon x(m—-r)—+ 2 r-m | 
r=0 rt p=[m)+1 rt 
[m] mrti = (m]—1 prt 0 prt rs) mrt 
= an | = - =z = _ | 
r=0 7! r=0 T! rotm) 7! r=tmj+i 7! 
mimi+1 
= 2e-™" —___ (1) 
[m]! 


= 2mY, 





F 


3-6) 


and 
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[m], 


(1) 
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where Y is the maximum value of the Poisson probability. The form (2) would also result from taking the 
limit of the mean deviation of the binomial distribution as derived by Frame (1945), 
MD,, = 2npqY »-1, (3) 


where MD, denotes the mean deviation of the binomial distribution with parameters p, g = 1—p, and 
n, and Y,_, is the maximum value of the terms in the expansion of (¢+ p)"—1. 
The ratio of the mean deviation to the standard deviation is then 


R(m) = = = 2m Y. (4) 


Both R(m) and MD can be quickly calculated from existing tables (Molina, 1942; Pearson & Hartley, 
1954). The results of such calculation can be summarized as in Fig. 1. It may be confirmed that the 
discontinuity in [m] causes no discontinuity of R(m), but does cause a discontinuity in its slope at integral 
values of m. Let m = k+u, where 0<u<1,k = 0,1, 2,.... Then 


dR(m) dR(k+u)_ 2 





— e—k-u z-i4(2 
dm du ki’ ili A ath (5) 
> 0, 0<u<}; 
=@, u=4; 
<0, $<u<l. 
In particular R'(k+0) =k*Y = —R(k-0) (k=1,2,...). 


Thus R(m) has infinitely many relative minima, at m = k, and infinitely many relative maxima, at 
m= k—4,k = 1,2,.... The absolute maximum is 0-8578, at m = 4; the absolute minimum zero, atm = 0. 





1:2 T T ] 1 T l T T T 


O-8 bs 














Sr 06 
0-4 
02+ 
0 i l ! l ! N 1 
0 3 4 5 6 7 8 9 10 
m= mean 


Fig. 1. Ratio of mean deviation to standard deviation for binomial, Poisson, and normal 
distributions. Binomial: p=}, ©; p=}, x (shown only for n<9); n=1, ----- 3; n=4, 
aise (shown only for p<4); Poisson, ——; normal, ——-. 


By applying Stirling’s second-term approximation to k! we obtain from (4) 


u\*+4 1 ee 
Rok-+u) = s(2/m) (142) exp[ -u— 5p + Ole )]. (6) 
Now (+z) = exp [ e+ain(1 +] 


= exp [« > x (u—u?)+ or) | < 
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Hence R(k+u) = V(2/m) exp E (u—w— 3) + ort-*) | 


1 
= Vein | 1+ sunt +008) J. (8) 
Furthermore, it follows from the alternating series property in (7) that 
(2/7) exp E (u—u?—4)+ ovr) | <R(k+u)<,/(2/7) exp E: (uw—%})+ ort) | 3 


where O(k-*) comes from k! only, and thus introduces an error in R(k +4) of less than } % for k>1. In 
rticul 
es al Rk) <V(2/m) (k = 1,2,.-.)s 


1 
R(k+4)>./(2/7) exp E- + on) | >J(2/7) (k= 1,2,...). 
Since R(m) is continuous, R(m) therefore takes on the normal limiting value infinitely often. Equation (5) 
shows that R(m) equals ,/(2/7) exactly once between m = k and m = k+4 and exactly once between 
m=k+tandm=k+1,k=1,2,...;since R(0) = 0Oand R(4) = 0-8578, this statement also holds for k = 0. 
Equation (8) shows that for sufficiently large k the points of equality are at u = $+.,/(3)/6, and Fig. 1 


indicates that these are correct to one decimal place even for k = 1. 


By (8) ; 
R(k) = J(2/m) (1 -oi + O(k-*), (9) 

1 on! 
Rk +4) = s(2im) (14 5¢;) +O, (10) 


Equation (9) is quite accurate for small values of k, to four decimal places for k > 8, and is easily improved 
in the form (6); (10) is less accurate. Since R(k+ 1) is given by precisely the right-hand member of (9) 
also, the ratio of the maximum positive difference from ,/(2/7) to the two adjacent maximum negative 
differences is asymptotically 4. The value of 4 for the ratio of the maximum positive difference to the 
mean of the two adjacent maximum negative differences is, by calculation from (4), accurate to two 
figures for k + 4> 2-5; it is in fact better than the asymptotic expressions with either one or two further 
terms for k+4<7°5. 

In addition to the recurring exact equivalence of the Poisson distribution to the normal distribution 
with respect to R(m) there would be difficulty in distinguishing the distributions by use of R(m) due to 
sampling fluctuations. This may be confirmed by examining the percentage points of the distribution of 
the statistic a = (mean deviation) /(standard deviation) for samples from a normal population (Pearson 
& Hartley, Table 34A). The upper 5 % point for sample size n = 36 is 0-8578, which also is the absolute 
maximum of R(m), attained at m = 0-5. The lower 5% point for n = 36 is 0-7440, which exceeds R(m) 
only for m<0-21 and 0:98<m<1-02. For n = 1001 the upper and lower 5% points are 0-8090 and 
0-7869, which include between them the R(m) maxima for m>3 and the R(m) minima for m>6. Since 
the Poisson distribution is quite asymmetrical for m < 4, say, and is J-shaped for m <1, the ratio of mean 
deviation to the standard deviation is a poor criterion for distinguishing the Poisson and normal distribu- 
tions, or for determining whether a Poisson distribution is approximated by the normal. The matter is 
rather academic since there seems little need or likelihood of such application. 

The ratio of mean deviation to the standard deviation of the binomial distribution follows a quite 
similar damped oscillation as the mean increases. This can be confirmed by examining Johnson’s asymp- 
totic formula (4) for R in the manner that the exact formula (4) for the Poisson distribution was examined 
above. When the mean m = np is varied by varying p with n fixed, R varies continuously between 
minima less than ,/(2/77) with discontinuous but finite slopes at integral m and smooth maxima greater 
than ,/(277) at mnear 4, 3, ...,.2—4. The graph is symmetric about m = 4n. Only minimum values occur in 
Johnson’s table. The asymptotic ratio (for large n and p < 3) of the maximum positive difference from 
/(2/7) to the absolute values of the adjacent maximum negative differences is 


b+p4 
1—pq 
which approaches the Poisson value as p approaches zero but is unity for p = 4. The behaviour for small n 


is illustrated in Fig. 1 by the representative case n = 4 and the extreme case n = 1. Since exact calcula- 
tion confirms for small n the qualitative behaviour indicated by the asymptotic formula for R, we may 
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conclude in particular that for any fixed n there are exactly 2n values of p for which R for the binomial 
distribution equals ,/(2/7). 

If p is fixed while n varies, then m = p, 2p,..., and R does not have a continuous graph as a function 
of m. The behaviour can be examined by considering n continuous, with results similar to those preceding. 
The cases p = 3 and p = } are plotted in Fig. 1. The upper bound of the ratio of mean deviation to 
standard deviation for any distribution is attained with p = 34, n = 1. 


It is a pleasure to acknowledge helpful suggestions by M. M. Siddiqui. 
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Note on the characteristic function of a serial-correlation distribution 


By ROY LEIPNIK 
Test Department, US NOTS, China Lake, California 


1. Kendall (1957) and White (1957) have recently derived formulae for the moments of a two-para- 
meter distribution which has appeared in several studies of serial correlation (Leipnik (1947); Quenouille 
(1948)). In an unpublished paper, the writer found a Neumann-type series for the characteristic function 
of this distribution for a generalized parameter range. It may now be of interest to reproduce this result 
here, along with compact expressions for the moments derived therefrom. 

2. Let the function f = f,,, be defined for |x| <1,A>0, | p|<1 by 

T(A+1) 
(x) = ——_* —"_ (1 ~~ @)A-4(1 4. p?— 2a)-A. (1) 
Fr, p(&) Prat) )A~8 (1 +p? — 2px) 

This is clearly non-negative in the given ranges for x, A, p. The Fourier transform ¢ of the above can 
be calculated as follows. 

The generating function for the orthogonal polynomials c) of Gegenbauer is (Erdelyi et al. (1953)) 


(1+ p?—2px)-A = >) p* cX(x), (2) 


which converges for |p| <1, |x|<1. It is known that (Erdelyi) for A>0, n>0, 


T(2A +n) 
sup | cA(x) | = c4(1) = ————_.. 
Be n( | ” n!T(2A) 
Hence, (2) converges uniformly and absolutely in the region | x | <1, |p |<p9<1,foreachA>0. Insertion 
of (2) in (1) yields for ¢ the result 


1 T(A+1) ©& 1 
(¢) = | etat (x) da = ——— - kl (1—a?)A-t A(x) ef@*da. (3) 
-1 Trp PTA+H 20" 1 ‘ 
The integral of Gegenbauer (Erdelyi et al.) after a suitable change of variable, becomes 
1 T(2A+k) .. 
| _ (dna teh (a) etde = ATUTA+H Gy BOAT all (4) 


for A>0, | t|<0o, where J),,;, is the Bessel function of order A + k. 
Hence we find 

,r(2A+k) 
kt T(2A) 





2\A B 
am) = (7) Ta+H E Go Trurlt (5 


a series of the Neumann type (Watson (1944)). 
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3. Moments of f,,, can be evaluated by writing J) .,,(¢) as a series and making an appropriate change 
of summation index. 
We have for A>0, |¢|<00, k>0 
oo (- 1)™ (3t) jam+k+a 
Jy 4(t) = a 
a+) = ay miT(A+k+ m+)’ 


Since the double series 








o © T(2A +k) co 0 T(2A+k) | 4¢|2™ 
2m+k k 
a x |e |*| a [** kim!TA+k+m41) < yey ay | oP kIT(A+k+1) m! 
T(2A+k) 
= 172 k => tales th 
exp(jt ) S| deel kiT(A+k+1) 


converges for all A>0,p,t¢ by the ratio test, we can insert (6) into (5) and rearrange the summation to 


obtain T(A+1) 2 (i) =: T(2A +. p—2m) p?-2™ 
bt) = & cine D Vise Bi i—_ (7) 
T(2A) y=0 o F(A+p—m+l) m!(p—2m)! 


Thus for the pth moment of f,,, about zero we obtain at once 


si 1T(A+1) G2) T(2A+p—2m) p?-2” r 
H == : >» bade a _ aA = fly (A, p). (8) 


> 27 T(2a) Sto (A+ p—m+1)m!(p—2)! 
Clearly y, is an even polynomial in p when p is even, and an odd polynomial when p is odd. The 
polynomials y3,(A, p) are similar to, but more complicated than, the Gegenbauer polynomials c4(p). 
As A + ©, it is easy to verify that 
T'(2A+p—2 T(A+1 
lim ( ies ae 5 * = bm 0 2?, 
Aro T'(2A) TA+p—m+1) 
where 6; ; is the Kronecker symbol, so that the moments /“, tend to p”, the moments of the c.f. y(t) = e*? 
corresponding to a unit distribution concentrated at p. 
Explicitly, we have for p = 0,1, 2,3 





Mo = 1, 

at te 

eee OS ee 

i ED 1 

Ma= X41) (A+2)” 7B A4])’ 

: A(A + 4) (A+ 1) 3 A 
hs = ahd 2 pe+ 


(A+ 1) (A+2)(A+3)h "2 (AF 1) (A42) 


If we set A = 4n, then for integer n, the moments jv, agree with the formulae given by Kendall and 
indicated by White. I have checked this for p = 0,1,..., 10. 


4. The proof of convergence of the standardized distribution to the normal, as A - oo, can also be 
carried out in terms of the characteristic function. (Kendall’s derivation is based on examination of the 
moments.) 

Note first that 0 = (a3/A), (ui/0) = At B,, where 


amur-e(ro()}- =(2%5)(0096) 


The c.f. ¢ of the standardized distribution is given by 


A(t) = exp (— mt) 6(<) = = exp (— aro (5 ay" P(A+1), 
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We now utilize a formula of Neumann for the square of a Bessel function, namely (Watson, 1944) 

















(42)2" <o m (2n + 27-1) 
Jn(2) = —— —1)"2" es" 9 
= Tint Ito "AL Gat ens 23 oi 
Clearly 0) = eae 1 (-1)" s\" mim — (2A42k+2j-1) 
tals, ~ (T(A+k+1)) 2, a jai (2A+2k+j) (2A + 2k +2) j° 
For large A, it follows that 
Abt\ — {F(Abt/ay)}At*#/ B (-1)™ )"" 1 ) k 
Jws(E)= Se (ea bz. 2" (1+0(3)) 
_ {B(Abt/a)}A+* e k 
= "Fares &?(~ag)(1+0(;))- ss 


Hence T(A+1) 


t)= —iri 
A(t) = exp ( iA‘Byt— a) ey Tea) 
2 ey T(2A +k) 3) 
oe 
2, Qa, ) RIT(A+h+ i +0(; 
ae a ipatt 1 
= gee ——: m2 
exp idiByt aa) (2a, A+1; a “)(1+0(;)). (11) 
where ,F’, is the confluent hypergeometric function. Consider now the function 


Y,(z) = exp (— 2tAOz — dz?) ,F, (2A, A+ 1, iAdz). 


From the differential equation xy” +(f—2x)y’—ay = 0 of y(x) = ,F,(«,f; x), it follows that Y,(z) 
satisfies the differential equation 


= y’ (ae “r) 








2+ +(40—1)2+—) 





+ y,| 20+ 20 — ar+(s + 4iatgr—ginig 42 At? 


x aT *) 2 £+28(40—1)2t+ 528 | = 0. (12) 


iva 
af 
A+1 
2¥y ioe 3A—1 “ r,(# 262 3A—1 ~\. 


PY 3 
If we choose 6 = ——, 6 = 2/ ) — 1, we obtain 
A+1 


A TONG TAs Ta AT ME AFT GA 
If we let A > c formally, we find the limiting equation Y,, = 0, Y,, = constant. Since Y,(0) = 1 for each 
A, we have Y,,(z) = 1. 


(13) 


More rigorously, we can expand Y) (z) in the series Y,(z) = 1+ > A-**Y\” (z) and show that the 
=0 
sequence { Y\)(z)} is such that Y)(z) = 1+0(A-) for fixed z. Thus we hee 


ipatt id' pt A \'- *) 4 
AF, (22, A+1,5 =) en (sara* Pen |e (1+0(A-4)). 


al Ap? 
= i, 3 —— _ -4 
and A(t) (exp [ia (ax m(A+1) ~ pr) + taal? aad —p? 1) }) (1+0(A-4)). (14) 
Note that the coefficient of ¢ is 





Mi (of) = af -0 


a)(A+1) o 
A8p2 Pe 1 
and that (Sap et-1) a= -1+0(;). 
Hence we have A(t) = exp (— }®) (14+ 0(A-4)), (15) 


which proves that the standardized distribution tends to the normal. The technique of asymptotic 
differential equations employed above may be useful in other such problems. It is familiar in physics 
under the name ‘W.K.B. method’. 
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Two further applications of a model for binary regression 


By D. R. COX 
Birkbeck College, University of London 


1. Introduction. In a recent paper (Cox, 1958), I have discussed some aspects of a logistic model for 
analysing regression when the dependent variable can take only two values, say 0 and 1. In the present 
note two further applications are presented of what is essentially the same model. The first is to the analy- 
sis of 2 x 2 contingency tables based on matched pairs, and the second is to the testing of the agreement 
between an observed binary sequence and a corresponding sequence of probabilities. 


2. The 2 x 2 contingency table with matched pairs. Consider the form taken by a simple comparison of 
matched pairs when the observations are (0, 1) variables. Let there be n pairs of individuals, the pairing 
usually being such that the two individuals in any one pair tend to be alike. Let one member of each pair 
belong to group A, the other to group B, the assignment being randomized if a comparative experiment 
is involved. An observation, taking one of two values 0 and 1, is made on each individual. For the ith 
pair, let these be represented by random variables Y;,, Y ;,. The possible observations on a pair, writing 
that on A first, are (0,0), (0,1), (1,0) and (1, 1). 

It is possible to form a 2 x 2 contingency table from the data 


Group A Group B 
° | 
1 . oe 
n n 


McNemar (1947) seems to have been the first to point out that the usual y? significance test for such 
a table is invalid, because it ignores the correlation induced by pairing. He recommended that the signifi- 
cance of the difference between A and B shouid be tested by rejecting the pairs (0, 0) and (1, 1), and by 
examining whether the proportion of (1, 0)’s among the remaining ‘mixed’ observations (0, 1) and (1, 0) 
is consistent with binomial variation with chance }. Mosteller (1952) and Cochran (1950) have given 
further accounts of this test and Cochran has discussed extensions to the comparison of more than two 
groups. Stuart (1957) has recently obtained a test equivalent to McNemar’s by arguments based on the 
theory of stratified sampling. 

This work raises two problems. Are there circumstances under which the test is optimum, and is there 
a corresponding estimation procedure? ‘To deal with these questions we must set up a parametric model 
covering the non-null case. The simplest such model seems to be the following. Let all random variables 
be mutually independent and let there be a parameter A, characteristic of the ‘th pair and a parameter 
describing the true difference between A and B, such that 


Pr( ¥¢q = 1)/Pr( Yq = 0) = Ay, (1) 
Pr( ¥ 4 = 1)/Pr( Yq, = 0) = pdj. (2) 


If we write A; = e*i, yy = e4, we have the logistic model of the earlier paper. 
It follows by the arguments of that paper, in particular of §4-5, that the jointly sufficient set of 


n 
statistics consists of (i) = Yj, (ii) the pair totals (Y;,+ Y,).i = 1,...,n. Further, optimum inference 


an 
about y, regarding A,,...,A,, as unknown nuisance parameters, is based on the distribution of (i) condi- 
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tionally on the set (ii). Now whenever Y,,+ Y,,+ 1, the contribution of the ith pair to (i) is fixed. Hence, 
the conditional distribution just mentioned is equivalent to that of R = number of pairs (0, 1) condi- 
tionally on the observed value of M = number of pairs (0, 1) or (1, 0). 

Now a simple calculation from (1) and (2) shows that 


Pr(Yiq = 0, Yn = 1| Vict Yon = 1) = W/(1 +p) = 4, say. (3) 
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Therefore R, conditionally on the observed value of M, has a binomial distribution 
m 
PrH(R=r|M =m) = ( ) a(1—@)"—. (4) 
r 


In particular the optimum test of the null hypothesis y = 1, 0 = 4 is McNemar’s test, and confidence 
intervals for 0 and hence for y are obtained in the usual way for a binomial parameter. The significance 
test can be looked on as the very special case of Haldane & Smith’s (1948) test for a serial order effect 
obtained when each series contains just two items. 


Example. Mostelier (1952) illustrated the test on an experiment in which each of 100 subjects used 
both of two drugs A and B, the response being a dichotomy ‘not-nausea’, ‘nausea’ (0 and 1, say). 81 
subjects never had nausea, i.e. gave the observation (0, 0), 9 subjects gave (1, 0), i.e. had nausea with A 
but not with B, 1 subject gave (0, 1) and 9 gave (1, 1). The significance test of the null hypothesis that the 
drugs are equally liable to induce nausea amounts to testing whether a division of 10 trials into (9, 1) is 
significantly extreme in a binomial distribution with chance }. The exact significance level in a two-sided 
test is 11/512 ~ 0-021; as an approximation to this, we get from a y? test, corrected for continuity, that 
significance is attained at very nearly the 0-025 level. A table of 95 % confidence limits for the binomial 
probability (Hald, 1952) gives (0-003, 0-445) as the limits for 0 and hence the odds factor y is between 
1/300 and 4/5. 

Tests and interval estimates comparing the values of y in different experiments can be done by familiar 
techniques for binomial variates. 


3. Test of agreement between a sequence and a set of probabilities. Let Y,,..., Y, be mutually indepen- 
dent random variables each taking the values (0, 1) and let p,..., p, be a given set of numbers, 0 < p; < 1. 
Suppose that it is required to use observations on Yj,..., Y, to test the hypothesis that 


Pry, = fh=0,; (= 1,...,%). (5) 


For example, a weather forecaster might put forward each day a number purporting to be the proba- 
bility that it will rain the following day. It might then be required to test whether the observed occur- 
rences of rain are consistent with these probabilities. 

If n is large, we may group the trials into sets each with nearly constant p,; then the observed propor- 
tion of 1’s in each set can be compared with the corresponding p,. Let be too small for this test to be used. 

One method of deriving a small sample test, when special alternatives to (5) are not available, is to 
consider a family of probabilities derived from (5). This family is characterized by a continuous para- 


Jenganeas log {Pr,(¥, = 1)/Pr¢(¥; = 0)} = Plog{p,/(1—p,)}. (6) 


The null hypothesis (5) corresponds to # = 1. If 8 > 1, the suggested probabilities p; show the right general 
pattern of variation, but do not vary enough. If 0</<1, the suggested probabilities vary too much. 


If 8<0, the p; vary in the wrong direction and if # = —1, the p; are the complements of the true 
probabilities. 
The log likelihood under (6) of an observed series ¥,, ..., Yn iS 
Bdy, log p, +P U(1—y,) log (1—p,) — Llog {pi + (1 —p,)4}. (7) 


Hence, the sufficient statistic is obtained by scoring 
flog (2p;) when Y,;=1; " 
re (log [2( 1 —p;)] when Y, = 0, 


and by considering a total score X = & X,. The factor 2 is included to make the expected score positive 
and to arrange that an event of probability 4 scores 0. 
Under the null hypothesis # = 1, 


E,(X) = nlog2+ Zp, log p,;+ X(1—p;) log(1—p,), (9) 
V(X) = Xp,(1—p,) flog[p,/(1 — p;) }}*. (10) 
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Provided that n is not very small and that none of the 7; is near 0 or 1, the distribution of X is nearly 
normal. 

In principle it would be possible to calculate confidence intervals for £ from an observed value X = x. 
If x significantly exceeds (9), this is evidence that £> 1. 

Example. Suppose that there are 16 trials, 8 of which have outcome 1 and 8 have outcome 0. Let the 
p, corresponding to the zero observations be 0-1, 0-1, 0-2, 0-2, 0-4, 0-5, 0-6, 0-7, and corresponding to the 
unit observations 0-3, 0-3, 0-5, 0-6, 0-6, 0-8, 0-9, 0-9. 

Thus the score for the first observation recorded as 0 is log [2(1—-p;,)] = log 1-8 = 0-255, and the score 
for the first observation recorded as 1 is log (2p,;) = log 0-6 = — 0-222. We find that the total observed 
score « = 1-106 and that under the hypothesis £ = 1, equations (9) and (10) give 


E,(X) = 1-030, V,(X) = 0-785, 


so that there is excellent agreement with expectation. Under the hypothesis # = 0, i.e. that 1’s occur 
randomly with constant chance }, we find 


E,(X) = nlog2+42Xlog[p,(1—p,)] = — 1-329, 
V(X) = 2 flog[p,/(1—p,)]}? = 1-314. 


The observed value differs significantly from H,(X) at the 5 % level. Thus the data support the idea that 
1’s do not occur with constant chance } and are in excellent agreement with the suggested probabilities. 


The family (6), on which the test just described is based, is especially appropriate when the sequence 
{p;} is known to be correct at and near p = } but possibly incorrectly spread around p = }. Thus we may 
call the test based on (9) and (10) a test for spread. A natural generalization is to replace (6) by 


log {Przq(¥; = 1)/Prg.(¥; = 0)} = Plog{p;/(1—p,)} +, (11) 
the null hypothesis being that # = 1, a = 0. The pair of sufficient statistics are X, as defined previously, 


and Y = XY,. Under the null hypothesis, X, Y are nearly jointly normally distributed with the mean 
and variance of X given by (9) and (10) and with 


EY) =p, Vi(¥) = =p(1—p)), (12) 
C\(X, Y) = Xp,(1—p;) log[p,/(1—p,)]. (13) 


Note that if the p; are symmetrically arranged about $, X and Y are uncorrelated. 

A test for bias ignoring spread will be based on Y alone, i.e. solely on the observed total number of 1’s. 
If both bias and spread are of interest, it is necessary to specify the relative importance to be attached to 
each, if an optimum small-sample procedure is to be found. .Since it is rarely possible to do this, a sensible 
practical approach is to find the observed values x and y and to see whether 


(x— £,(X), y—E,(Y)) Yary CX, ‘2 Cla 
C(X,Y) VY) y—E,(Y) 


is significantly large in the y? distribution with 2 degrees of freedom. The expression (14) is, except for 
a factor 4, the exponent in the bivariate normal distribution of X and Y;; it is the likelihood ratio statistic 
for testing the hypothesis that X, Y have the bivariate normal distribution (9), (10), (12) and (13), against 
the hypothesis that X, Y have arbitrary means, but the same covariance matrix as under the null 
hypothesis. This, of course, does not allow for the fact that the covariance matrix varies in a determined 
way with the parameters « and £. However, the determination of the correct likelihood ratio criterion 
requires the maximum likelihood estimation of a and /, which is tedious. 


(14) 


Example. Consider the data that were analysed previously. We have that the observed value of Y is 
y = 8 and that #,(Y) = 7-7, V,(Y) = 2-930, C,(X, Y) = —0-090. Therefore, the observed value of 
Y, as well as that of X, agrees well with its expectation under the suggested scheme of probabilities and 
the need for a combined test hardly arises. The formal details of such a test are that 

(1-106 — 1-030, 8-7-7) / 0-785 —0-090\-1 /1-106—1-030 
— 0-090 2-930 8-7-7 

is to be tested as y? with 2 degrees of freedom. The value of expression (15) is 0-01: a value smaller than this 
would arise by chance only about 1 in 100 times. 


(15) 


There are further problems connected with the general situation discussed here. First, the same set 
of observations can be consistent with several alternative sequences of probabilities and it may be 
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required to consider which sequence is preferable. It seems reasonable to prefer that sequence of proba- 
bilities for which the information in Shannon’s sense is a minimum, for this implies minimum uncertainty 
concerning the outcome of the realized sequence. According to (9), this amounts to preferring the proba- 
bilities for which £,(X) is a minimum. Secondly, it happens in some applications that the probabilities 
p;, are not given, but have to be estimated from data by fitting a particular type of model, often to the 
same data with which goodness of fit is to be tested. In such cases, the most satisfactory test of goodness 
of fit is likely to be obtained by fitting a model containing additional parameters and testing estimates of 
the additional parameters for significance from zero. The approach of the present section is relevant only 
when there are available no special forms of alternative specific to the problem. 
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A note on a series solution of a problem in estimation* 


By IRWIN GUTTMAN 


University of Alberta and Princeton University 


1. INTRODUCTION AND SUMMARY 


If ¢(x) is a sufficient statistic for the family of probability functions { P§ | @eQ} defined over the real line, 
and if f(~) is an unbiased estimator of a real valued function of the parameter, say g(@), then it is well 
known that the function h(t) = ECf(X) |} 


is an unbiased estimator of g(@), and that it has smaller variance and risk (for strictly convex loss func- 
tions) than f(x), unless of course f(x) = h(t(x)) almost everywhere {73}. Further, if ¢ is also a complete 
statistic, then h(t) is the unique Uniformly Minimum Variance (UMV) unbiased estimate of g(@). 

The above holds for continuous and discrete probability functions 5. We discuss here the case where 
¥% are discrete probability distribution functions defined on the real line, with probability densities 
p(x), where x = 0,1, 2,.... 

Under certain regularity conditions given in § 2, a method of determining h(t), without considering 
unbiased estimators f(2) of g(@) at all, is given. This has the feature, then, of avoiding the evaluation of 
conditional expectations. The method also allows for a solution of a problem raised by Girshick, Mosteller 
& Savage (1946). This is discussed in § 3, where some examples are given to illustrate the theorem of § 2. 
It is interesting to note that a special case of the method has been used by Lehmann & Scheffé (1950) to 
prove completeness of some statistics. 


* Prepared in connexion with research sponsored by the Office of Naval Research. 
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2. THE METHOD 
Using the notation of §1, we now state the following: 
Theorem: If t(X) is a sufficient statistic, assuming only non-negative integer values,* with probabilities 
Pelt) = m(A)k,6¢ (¢ = 0,1,2,...) 


depending on a single parameter @ taking values in an interval containing the origin, then there exists an 
essentially unique unbiased estimate of g(9) with uniformly minimum variance |if, and only if, 
G(@) = g(A)/m(@) is analytic at 0 = 0, with a power series expansion G(@) = La,4'such that a, = 0 for all 
t for which k, = 0. 


Proof: Sufficiency. We prove that 
fir = ak, (k, + 9), 
f, arbitrary (k,= 0) 


are the only estimates having the required properties. For 
E(f,) = XUfym(O)k,0t = m() Xa,Ot = m(@) G(é) = g(A). 


That is, it is unbiased. It is essentially unique, since it is only arbitrary at points of zero probability for 
all 0. To prove it has uniformly minimum variance we need only show ¢ is complete. The values of ¢ with 
k, = 0 may be excluded. Putting a, = 0 we see that /, = 0 is the only unbiased estimate of zero, proving 
the result. 


Necessity. If f,is an unbiased estimate of 9(@) 
YAM) k,Ot = g(A), 
or La,0' = G(A) 
with a, = f,k,. Since @ includes the origin G(@) is analytic at 9 = 0, and the rest follows as in the sufficiency 
part of the proof. 
3. SOME EXAMPLES 


(a) The negative binomial. Suppose that items of a manufactured process are such that each item 
produced is an independent trial with probability Q of being non-defective, and P = 1—Q of being 
defective. Suppose the process is subjected to an inspection in n stages, (n + 1), the inspecting in any one 
stage continuing as long as the trials show non-defective items, and stops as soon as a defective turns up. 
If we let X;+ 1 be the number of independent trials needed to get a defective in the ith stage, then 


Pax) = PQ*, (3-1) 
n 
where ; is the number of non-defectives in the ith stage. It is well known that T = )) X, is a sufficient 


i=1 
and complete statistic for the parameter Q. The density of T is 


+t-1 
pel =(") ) Pra. 


Suppose an unbiased estimate of g(Q) = 1—Q is wanted, based on the above inspection procedure. 
Then, it is quickly verified that, in the notation of the theorem, provided n> 1 


ea a 
a, = ‘ , B= 


and hence h(t) = (n— 1)/(n+t— 1). The fact of h(t) being unbiased was known to Haldane (1945). Because 


i 9) 
* Ift(X) = > X,, where the X, are n independent observations from a distribution of the exponential 
1 


type, then the condition p¢(t) = m(A) k,4' holds automatically. For if the density is of the form 
S(O) k(x) 0* (where 6 may be written as e%) then t= ZX, is sufficient (easily seen by the Neyman 
Criterion; see Fraser (1957, p. 20)) and has density 


[f(A)]" k*(t) OF 
which we denote by m(9)k,6'. 
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it is a function of the complete and sufficient statistic, it is the unique UMV unbiased estimate of the 
parameter (1—Q). 

The above estimate of P was also known to Girshick et al. (1946). In their paper, a method was given 
which allowed estimates of P¥Q* to be determined, providing that (z, y) were integral and non-negative. 
That is, their method would not allow for an unbiased estimate of say, the variance of X, where X has the 
density (3-1), that is, of Q 

HQ) ai-o* 


Our method does, for it is again very easy to see that 


n+t n+t—1 
pi Ok b= ( t 
(n+t)t 
(n+1)n° 


This is the unique UMV unbiased estimate of var (X). 


obtaining h(t) = 





(b) The Poisson distribution. Let X have the Poisson distribution with parameter A, that is 


A*e-A 
py(x) = — 


x! 





n 
If a sample of n independent observations is taken on X, then it is well known that T = )) X; is asuffi- 
i=1 
cient and complete statistic for A, and it has the Poisson distribution with parameter nA. Suppose we 
wish to find an unbiased estimate of the probability that X = 0 or 1, that is, of 


g(A) = e-A(1 +A). 
— t-1 inh t 
Then a,= ni (1+" ; ‘) and k, = — 








(¢—1)! 
a t 
and so we have me) = (" ‘) (14 }. 
n 


n—1 








This is, of course, the unique UMV unbiased estimate of the probability that X will be zero or one. 


Thanks are due to Prof. F. J. Anscombe, whose valuable discussion and encouragement it is a pleasure 
to acknowledge. 
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On Nair’s transformation of the correlation coefficient 


By MUNUSWAMY SANKARAN 


Presidency College, Madras 


Fisher’s z-transformation of the correlation coefficient r is well known to many readers. Recently Harley 
(1956), while disproving a conjecture of Hotelling (1953), studied some further properties of sin-!r due 
to W. F. Sheppard. Little is known, however, of a statistic suggested by U.S. Nair. It is proposed here 
to give some new results for the distribution of Nair’s statistic, studied by Pillai (1946), and also for an 
allied statistic. The statistic suggested by Nair is x = (r—p)/(1—pr) which, like r, ranges between — 1 
and +1. 

The results that emerge out of the present study are: 

(i) 2 and an angular transformation of x, namely sin-! 2, are both asymptotically normally distributed 
and sin-! x is more nearly normal than z. 


36-2 
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(ii) The first and second approximations of sin-! x are as good as the corresponding approximations 
of z in evaluating the probability integral of r. 
(iii) Applying the C.F. form of the Edgeworth expansion to sin—! x for the evaluation of the probability 
integral ofr it is seen that forn = 50, p = 0-9 it is as accurate as z, but forn = 25, p = 0-9 it is not so good, 
Although it is not clear that the transformation sin-! z has any practical advantage over the z-trans- 
formation, indeed it is not appropriate in some situations where z is useful, nevertheless, the following 
results are felt to be of sufficient interest to put on record. 


NorMALITY OF NaIR’S STATISTIC 


That the statistic « = (r—p)/(1—pr) is asymptotically normally distributed can be established with 
the help of a result of Cramér’s (1951, Ex. 23, p. 259). To find the rapidity of convergence to normality 
we evaluate the first four moments. For this, we expand the various powers of x, namely 2, x*, x* and x 
in the form of series of powers of (r—), retaining terms up to (r—p)* and take the expectations. Since 
the expected values of (r—)* for h = 1 to 6 are given by Hotelling (1953), the moments of z are obtained 
without difficulty. The following are the first four moments: 


Pp pt+p® , p—2p*+3p* ) 
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From the above the first four cumulants are obtained and are 
eH pt+p® _ p—2p*+3p* . 
“() = 1) ae—1* len—1) 
‘ p? p?—p* 
ae = SS 
(2) = PtP . 
 ——_ 
6 
K(x) = aoe J 
Let us now consider the angular transformation of x given by 
= sin-! = sin-! =A) 
y =sin-'z = sin (72 : 
The moments of y are given by 
re 3p+p* | 3p+3p* . 
MY) = FR -1) t an—1)! 16(n—1)3 *” 
Be Cee ae 4—p*? 16+3p?—9p4 
MY) = 74+ Gn)  24(n—1)2 *°" a 
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ais ae 9p + 6p* 
MY) = 1 t Bm” 
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Hence, the cumulants of sin-! 2 are given by 











eaey 3p+p >  3p+3p 
iY) = dn —1) * in)? * lem—)?t” 

sc ay. 2—p*  8-—3p?—6p4 
rly) = n—-1' 3n—1) 12-1) °°” 

clita > (4) 
K3(y) = 2X(n—1)* Tt eeey 
2 

K,(y) = “ae ) 


The above expressions for the cumulants of sin-! 2 can be compared with the corresponding expressions 
of z given by 
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K,(z) = aaail?+ n—1 +}. ‘ 


Comparison of the 3rd and 4th cumulants show that y is more nearly normal than z. 


PROBABILITY INTEGRAL OF 7 


Harley, in one of her recent papers (1954) on the probability integral of r, applied the Edgeworth expan- 
sion to the z-transformation of Fisher, because of the inadequacy of the z-transformation for values of p 
near unity and because the exact values are not available in David’s tables between n = 25 and n = 50. 

We shall also apply the Cornish—Fisher form of the Edgeworth expansion to y = sin~!x for n = 25, 
p=0-9 and n = 50, p = 0-9. Also, we shall use two other approximations, called the first and second 
approximations of y. The first approximation of y assumes that y is normal with mean zero and variance 
1/(n— 2). The second approximation assumes that y is normal with mean and variance given by (4). 


Table 1. Comparison of exact and approximate values of the probability integral of r. 


Case n = 25, p = 0-6 
































From z From y 
r Exact 
Approxi- | Approxi- Approxi- | Approxi- 
mation I | mation IJ mation I mation IT 
—- a a a ee Se a 
0-2 0-00934 0-01072 | 0-00884 0-01182 0-00979 
3 -03097 -03598 | 03078 -03624 -03103 
4 -09046 -10311 09147 -10079 -08938 
5 *22771 *24994 | 22971 +24590 *22584 
0-6 0-47500 0-50000 | 0:47521 0-50000 | Q-47515 
‘7 *77782 -79299 | *77584 -79703 | -77998 
“75 *89652 | -90531 *89843 -90736 *89755 
8 -96741 | -97140 | -96769 ‘97084 -96702 
-99469 -99586 -99520 -99488 -99406 
| | 
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Tables 1 and 2 give a comparison of the exact values with the various approximations of y and z. The 
conclusion reached is that the first and second approximations of y are as good as the corresponding 
approximations of z and that the Edgeworth expansion of y is as successful as the corresponding expan- 
sion of z for n = 50, p = 0-9, but not so good as z for n = 25, p = 0-9. 









































Table 2 
| | 
From z | From y 
r | Exact | | 
| Approxi- | Values from Approxi- Values from 
mation II | the Edgeworth) mation II | the Edgeworth 
expansion ot z | expansion of y 
(a) Case n = 25, p= 0-9 
0-75 0-00743 0-00698 0-00742 | 0-00785 0-00745 
82 -05574 -05589 -05578 -05526 05570 
84 “09859 -09981 “09862 -09740 09847 | 
87 -22387 *22576 -22379 -22196 22359 | 
0-90 0-46244 0-46250 0-46247 | 0-46239 0-46244 
93 -78645 "78442 -78661 | -78850 -78663 
95 -94612 -94609 | -94612 | -94620 -94606 
965 -99263 *99325 | 99264 -99197 -99256 
| 
(b) Case n = 50, p = 0-9 
0-82 | 0-01285 0-01265 0-01286 0-01307 0-01288 
*85 05998 06024 -06001 -05975 “05999 
88 23202 +23294 -23200 | -23119 -23204 | 
90 47403 -47405 -47404 | -47402 -47404 | 
0-92 0-77108 0-77009 0-77112 0-77202 0-77108 
93 *88871 -88814 *88872 *88928 ‘88872 
95 99174 -99204 -99174 -99144 -99172 











CONFIDENCE INTERVAL FOR ~ 


The confidence interval can be obtained from sin-! x. Assuming that sin-\{(r — p)/(1 —pr)} is normal with 
mean zero and variance 1/(n — 2), let 1 — a be a preassigned confidence coefficient chosen sufficiently high. 
If a is such that aan 
P| —assin— (7-4) V(n—2) <al = l-a, 
1—pr 
D 
where a satisfies (27)-! | e-!"dt = 4a, then the above is equivalent to 
a 
r—d r+d 
trad? ST ped] I 7 
where d = sin {a/,/(n — 2)}. 
The following example is taken from David’s Tables of Correlation Coefficient (1938). 
The width of span (x) and length of forearm (y) of twenty males have been measured and the correlation 
between the two variates is found to be 0-55. Assuming that width of span and length of forearm are both 
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approximately normally distributed, what interval will cover the correlation coefficient between x and 
y in the population? 

From David’s charts I and IT for a = 0-05, 0-215 <p <0-755 and for « = 0-025, 0-140<p<0-790. Then 
using (A) and z with variance 1/(n— 3) we get 


a 


0-05 { 


a 
: 0-2 76 
from y: 0-217 <p <0-768, 0-025 { 


from z: 0-216<p<0-769, 


from y: 0-138 <p <0-800, 
from z: 0-142<p<0-798. 





In conclusion, it is a pleasure to acknowledge the help I have received from my students in the prepara- 
tion of the tables. Particular mention must be made of Mr G. Balakrishnan, Mr R. Raman, Mr S. Rajago- 
palan, Mr V. Sivakumaran and Mr 8. R. Srinivasavaradan. I also wish to express my sincere thanks to 
Prof. E. 8. Pearson for his helpful comments on earlier versions of the paper. 
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Short proof of Miss Harley’s theorem on the correlation coefficient 


By H. E. DANIELS, University of Birmingham 
AND M. G. KENDALL, Research Techniques Unit, London School of Economics 


1. In considering the correlation coefficient 7 based on a sample of » values from a bivariate normal 
population with correlation parameter p, Hotelling (1953) discussed the question: does there exist 
a function (7), independent of n, such that Eyy(r) = y(p) for all x? He showed that for functions ex- 
pressible as a Taylor series such a function could only be of the inverse sine type. Misled by an alge- 
braical slip he also concluded that even this kind of function failed to provide an answer. The error was 
noted by Harley (1956) who proved that for even n 


E(sin-!r) = sin- p. (1) 


Later (1957) she proved that this holds also for odd n. Her demonstrations are rather long and involved, 
and the following not only has the merit of being much shorter, but exhibits the basic reason for the 
existence of the relationship (1). 

2. If a pair of values (x1, ¥), (#2, yz) are chosen at random from a bivariate normal population there is 
said to be a concordance of type 1 if x, —x, and y, — y, have the same sign (cf. Kendall, 1955); or equiva- 
sep shal sgn (a — a) sgn (y;— yo) = 1. (2) 

The probability that this is so is (2/7r) sin-! p (Kendall, 1955). Consider now a sample of n values (a, y) 
from the population and a randomly chosen subset of two, say (2), ¥1), (%2, Ye), from this sample. Let C 
represent a concordance of type 1 in the subset. Since r can be regarded as sufficient for p in the subclass 
of scale-invariant statistics, we have 


2 
? .in-1p = P(C|p) = P(C|n)fir| p)ar 
1 
= EP(C |r). (3) 
To establish (1) we have only to prove that 
P(C |r) = (2/7) sin-r. 
No generality is lost if we measure x and y from the sample means. This done, we have for r 
n 
X XY 
i=1 


f= 
Tv 72 Vy 
{Daj Lyj}! 


(4) 


= cos 0, say. 




















572 Miscellanea 


For fixed r the vectors x and y in the sample space are at a fixed angle 6 to each other and otherwise 


randomly orientated in the space Xx; = 0, whatever the value of p. Consider E{sgn(x, — x2) sgn (y, — Y2) |r}. - 


sgn (x, —#,)sgn(y,—Yy,) is + 1 if x, y are both on the same side of the hyperplane x, —2, = 0, and —1 in 
the opposite case. For any fixed orientation of the two-dimensional plane spanned by (x, y) the 
hyperplane z,—x, = 0 cuts it in a line, and the probability that this line lies between x and y is 
0/n = cos“'r/7 = p, say. This is therefore the probability, for fixed r, that x and y are on opposite sides 
of x,—x, = 0 for random orientations of (x, y). Hence 


E{sgn (7, —x,) sgn (y;— Ye) | 7} = —pt+(1—p) = 1—2p 
= 1—(2/m) cos“! r = (2/7) sin-'r 
and the result (1) follows. 


3. An immediate consequence is that for any subsample n’ <n from (x,, y;) --. (%_; Yn) the rank correla- 
tion coefficient ¢t’ (the sample value of 7) is such that 


E(t’ |r) = (2/7) sin r. (5) © 
If we let n tend to infinity we find the familiar result 
E(t’ | p) = (2/m)sin-*p. (6) 
It also follows, on taking n’ = n, that the regression of ¢ on r is given by 
E(t|r) = (2/7) sin-1r (7) 
whatever the value of p. 
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Runs in a ring 


By D. E. BARTON anv F. N. DAVID 


University College London 


Some time ago when we were working on the distribution and properties of runs of multiple colours in 
a line we worked out the corresponding theory for runs in a ring, but were discouraged from publication 
by the criticisms that essentially no new mathematical points were involved and that there was no 
obvious statistical application. We do not agree with these criticisms, and the recent paper by Dawson & 
Good (1958) indicates that there is some interest in the runs in a ring problem. Accordingly, we give 
here a summary of results and some tables. 

r identical beads of k colours are supposed in a random order. The problem of the ring has two facets. 
First, the ring may be supposed to have been built up by sampling randomly from a finite population 
of beads, the beads being strung on a thread when selected and the ends of the thread tied after 7 beads. 
Secondly, a handful of beads r,,79,...,7, (27; = 7) can be imagined as placed in a circle in which case 
symmetries will need to be allowed for. 

The first problem where the ring is the line bent to a circle can be solved either from first principles— 
Whitworth (1886) gives a method—or by adopting the method for the line. If there are T runs in the line, 
i.e. T’— 1 alternations of colour, there will be 7'— 1 runs in the ring if the same colour is at the beginning 
and the end of the line, and 7 otherwise. The appropriate multinomial term and the number of permuta- 
tions are given for r = 2,3,...,12 and two, three and four colours in Table 1. If S = r—T the mth 
factorial moment of S is just 7/(7 —m) times the mth factorial moment of the same statistic calculated for 
the line (Barton & David (1957)). The same limits, the normal and Poisson, hold under the same condi- 
tions, and the positive binomial with the correct first two moments is again a suitable approximating 
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function. Given that the beads are not random in the ring, but that the probabilities are those of a simple 
Markoff chain, the distribution of 7’ under this hypothesis may be derived on precisely the same lines set 
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out by David (1947) for the two-colour-line case. 


(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial 


Table la. (Two colours.) Ring permutations in repeated sampling 














term.) 
| 
| | Runs 
Multinomial ¥ Partition | 
term | 
oy | 
2 2 (1?) 2 
3 3 (21) 3 
4 4 (31) 4 
6 4 (2?) > -# 
5 5 (41) S 
10 5 (32) > * 
6 6 (51) 6 
15 6 (42) . = 
20 6 (32) 6 12 2 
7 7 (61) 7 
21 x | (52) 7 14 
35 7 (43) 7 21 : 
8 8 (71) 8 
28 8 (62) , * 
56 8 (53) ; = * 
70 8 (42) 8 36 24 
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Table 1b. (Three colours.) Ring permutations in repeated sampling 
(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial ( 
term.) ter 
. Runs 
Multi- 
nomial et |... Wemencesmar priser tomers: or enn —__— 
— | | eer oe 6 7 8 9 0 Wd W@W 
_| | yeas ; ne sasipestaipllerth eh ase Seeeegte | 
| | 
Tia (13) | 6 
12 | 4 +] (21%) 8 4 
20 |) «(OS | (312) 10 10 
30 | 5 | (221) | 10 10 10 | 
30 | 6 | (41% | 12 18 | 
60 | 6 | (321) | 12 18 24 6 
9 | 6 | (23) | 12 18 36 24 | 
42 a (512) 14 28 
105 7 | (421) | 14 28 42. 21 | 
140 | 7 | (371) | 14° 28 56 28 14 
oe (32?) 14 28 70 70 28 | 
56 | 8 Ci (612) | 16 40 | 
168 | 8 | (521) | 16 40 64 ° 48 | 
280 8 | (431) | 16 40 96 72 48 8 
400 | gs | (42) | 16 40 112 144 96 12 | 
560 | 8 | (372) | 16 40 128 176 144 56 | 
| | 
722 | 9 | (71%) | 18 54 | 
252 | 9 | (621) | 18 54 90 90 
504 | 9 | (531) | 18 54 144 144 108 36 | 
756 | 9 | (52) | 18 54 162 252 216 54 | 
630 | 9 | (471) | 18 54 162 162 162 54 18 | 
1,260 9 (432) | 18 54 198 333 378 225 54 
1,680 | 9 (33) | 18 54 216 396 486 378 132 
| 
909 | 10 | (81%) | 20 70 
360 | 10 | (721) | 20 70 120 150 
840 10 | (631) | 20 70 200 250 200 100 | 
1,260 10 | (622) | 20 70 220 400 400 150 
1,260 10 | (541) | 20 70 240 300 360 180 ~— 80 10 
2,520 10 | (532) | 20 70 280 550 760 580 240 20 | 
4,200 10 | (432) | 20 70 320 700 1,100 1,130 680 4180 | 
3,150 10 | (422) | 20 70 300 600 900 780 380 100 
| | | 
110 «| a1 | (91%) | 22 88 | 
495 | ll (821) | 22 88 154 231 | 
1,320 | 11 (731) | 22 88 264 396 330 220 | 
1,980 | 11 (722) | 22 88 286 594 660 330 
2,310 | ll (641) | 22 88 330 495 660 440 220 55 | 
4,620 | 11 | (632) | 22 88 374 836 1,320 1,210 660 110 
2,772 | 11 | (31) | 22 88 352 528 792 528 352 88 22 | | 
6,930 | 11 | (542) | 22 88 418 957 1,716 1,848 1,276 517 88 
9,240 | 11 | (53%) | 22 88 440 1,100 2,046 2,508 2,024 880 132 pa 
11,550 | 11 | (423) | 22 88 462 1,188 2,310 3,058 2,662 1,408 352 
| | 
132 12 | (10,12) | 24 108 4 
660 | 12 | (921) | 24 108 192 336 
| 1,980 12 | (831) | 24 108 336 588 504 420 
2,970 12 | (822) | 24 108 360 840 1,008 630 : 
3,960 12 (741) | 24 108 432 756 1,080 900 480 180 | 
7,920 12 | (732) | 24 108 480 1,200 2,088 2,220 1,440 360 
| 6,544 | 12 (651) | 24 108 480 840 1,440 1,200 960 360 120 12 
| 13,860 12 | (642) | 24 108 552 1,416 2,880 3,630 3,120 1,620 480 30 
| 18,480 | 12 | (632) | 24 108 576 1,608 3,384 4,740 4,640 2,640 720 40 | 
| 16,632 12 (5%2) | 24 108 576 1,488 3,168 4,176 3,840 2,304 792 156 e 
27,720 | 12 | (543) | 24 108 624 1,812 4,104 6,396 7,008 5,076 2,160 408 | 
34,650 | 12 | (4%) | 24 108 648 1,944 4,536 7,506 8,712 6,912 3,456 804 } 














term.) 


Table lc. (Four colours.) Ring permutations in r 
(Probabilities are obtained by dividing the number of permutations by the appropriate multinomial 
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Multi- 
| nomial 
term 


24 
60 





120 
180 
210 
420 
630 


336 
840 
1,120 
1,680 
2,520 


504 
1,512 
2,520 
| 3,780 
| 5,040 
7,560 


720 
2,520 
| 5,040 
7,560 
6,300 
12,600 
18,900 
16,800 
25,200 


990 
| 3,960 
| 9,240 
| 13,860 
13,860 
27,720 
| 41,580 
| | 34,650 
| 

| 

| 





46,200 
| 69,300 
92,400 


1,320 

5,940 

| 15,840 
i | 23,760 
| 27,720 
55,440 
83,160 
33,264 
83,160 
110,880 
166,320 
138,600 
207,900 
| 277,200 
369,600 


re SOS ~S 





| 





| 


of 


OOMMOH WWAUBDD 


eooeoee 





Partition 


(1*) 
(21%) 


(31%) 
(2? 12) 
(41°) 
(3212) 
(2° 1) 


(51%) 
(4212) 
(3° 1°) 
(32? 1) 

(2*) 


(61%) 
(5212) 
(4312) 
(42? 1) 
(3? 21) 

(32%) 


(71%) 
(6212) 
(5312) 
(52? 1) 
(4? 1?) 
(4321) 

(42%) 
(3° 1) 
(3? 2?) 


(813) 
(7212) 
(6312) 
(622 1) 
(5412) 
(5321) 

(52*) 
(42 21) 
(43? 1) 
(4322) 
(38 2) 


(913) 
(8212) 
(7312) 
(722 1) 
(6412) 
(6321) 

(62°) 
(5? 1?) 
(5421) 
(532 1) 
(5322) 
(42 31) 
(42 2?) 
(432 2) 

(3*) 


12 
72 
42 
182 
252 


96 
336 
416 
496 
576 


180 
540 
720 
810 
900 
990 


300 

800 
1,100 
1,200 
1,200 
1,400 
1,500 
1,500 
1,600 


462 
1,122 
1,562 
1,672 
1,782 
2,002 
2,112 
2,112 
2,222 
2,332 
2,442 


672 
1,512 
2,112 
2,232 
2,472 
2,712 
2,832 
2,592 
2,952 
3,072 
3,192 
3,192 
3,312 
3,432 
3,552 


1 


70 
210 


240 
320 
640 
960 


540 
810 
1,350 
1,620 
2,160 


1,000 
1,600 
2,400 
1,800 
3,100 
3,900 
3,600 
4,400 


9,360 
5,760 
9,000 
10,080 
11,520 
10,800 
12,240 
13,320 
14,400 





| 
| 
| 
| 
| 
| 


Runs 

8 9 10 ll 12 

24 

144 

304 

744 

108 

540 126 

1,080 216 

1,530 666 
2,700 1,386 

300 

1,320 560 40 
2,520 960 60 

1,800 840 240 
4,050 2,840 790 
6,300 5,340 1,440 
5,100 4,440 1,740 
7,700 7,640 3,440 

660 
2.640 1,540 220 
4,840 2,640 330 
4,092 2,772 1,188 198 
8,272 7,612 3,652 484 
12,012 13,332 6,534 792 
9,570 9,834 5,478 1,518 
11,550 13,574 9,218 2,728 
16,060 22,604 16,258 5,038 
18,810 27,654 24,618 10,098 

1,260 
4,680 3,360 720 
8,280 5,760 1,080 
7,740 6,720 3,600 1,080 60 | 
14,640 16,320 10,440 2,640 120 
20,340 27,120 18,360 4,320 180 
8,928 8,064 5,184 1,728 360 
18,324 23,616 18,576 8,424 1,620 
21,624 30,816 28,080 14,016 2,544 
28,584 46,656 46,584 24,768 4,368 
24,120 36,768 36,144 21,168 5,760 
31,500 54,288 59,184 36,288 10,440 
36,060 66,528 80,784 58,008 18,420 
41,040 80,448 107,424 88,128 33,960 
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The second problem discussed partially by Jablonski (1892) is probably the one which Whitworth 
really had in mind. Jablonski imagined r; (7 = 1, 2,...,k) beads of k different colours set down in a ring. 
He enumerated the total number of different arrangements which could be made of these beads allowing 
for rotations and symmetries but not for turning the ring over. Except where there are common factors 
among {7,} the total number of arrangements will be 


k 
(r—1)!/ IT xy!. 
i=1 


Further, the distribution of the number of runs will be the same as the distribution of the number of runs 
in the linear problem with the common factor 7 cancelled out. We show here that Jablonski’s method 
for the enumeration of different arrangements in the ring may be shortened and extended to give the 
distribution of runs also. For clarity we will refer to these runs as Jablonski runs. 

Consider any given arrangement in the ring which we shall call a ring permutation, A. There will be 
r linear arrangements of A say A,, Ag, ..., A,, which we may obtain by cutting the ring at the r possible 
points. The set of these, which we will call S(A) consists of the r cyclic permutations of any one of them 
and we will let S(A) contain just m, different linear permutations. If d; is one of the divisors of the 
highest common factor h, say, of 71,72, ...,7,, then m; = r/d;. Suppose there are n divisors and let 


ee ee See 
Further, let A, be that member of S(A) consisting of the juxtaposition of d; similar arrays of m, beads with 
11/45, 7 o/dj, ...5 7 [dj 


of the respective colours. If Q,,. d;/r is the number of ring permutations which give linear arrays with just 
m, different line permutations, then > 
(r/d,)! . 
D> = —__*—“"____=fT, (say) (¢=1,2,...,n) 

di\d\h . (7,/d;)!... (7/d;)! “s 
These n linear equations for the {Q,} have a unique solution so that the total number of Jablonski 
arrangements is 1 
J=—-2Xd.Qy. 

"dh 

Let ¢(d) denote Euler’s ¢-function, i.e. if d is a positive integer, 4(d) will denote the number of positive 
integers not exceeding d which are relatively prime to d. We have that 


n 
x P(d;) Ta, =X XY H4)Q= XA DY H(d,;) = DdQa=rJ. 
j=1 d \h dj\d\h dh ajid ad\h 
Now let Q,(t) d/r be the number of those ring permutations with linear arrangements containing just r/d 
line permutations which have ¢ ring runs; let A’ be such a ring permutation and A; a corresponding 
line permutation. Aj consists of d similar line arrays each of which is an unrolling of a ring permutation of 
r/d beads of colour composition (7,/d, ...,7;,/d) and this ring permutation has ¢/d ring runs. If p,,(t) is the 
proportion of permutations of elements (r,/d,, ...,7,/d;) with r/d,; ring runs where repetitions are allowed, 


om YT Qalt) = Te,Palt) 
dj\d\h 
1 
and J(t) = ¥ Qalt) = - Y O(a) Tapalt) 
dh T dh 


is the number of ring runs with repeated ring permutations not allowed. 
A worked example will illustrate the method given in the previous section. Suppose r = 12 and we 
have three colours (6, 4, 2). It is seen that h = 2 so that 


Q,+Q,. = 13,860, Q, = 60 
and J = 1150+ 10 = 1160. 
Or alternatively since 
P(1) = 1 = (2) 
we have J = #5(13860 + 60) = 1160. 
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Further, 7 ,(¢)/r takes the values 2, 9, 46, 118, 240, 3024, 260, 135, 40, 24 as ¢ runs through the 
integers 3, 4, 5,..., 12, whilst 7, p,(¢)/r takes the values 1, 14, 2, } at the values 6, 8, 10 and 12 and is zero 
elsewhere. Thus J(t) takes the values 2, 9, 46, 119, 240, 304, 260, 137, 40, 3 with a total of 1160 (= J) as 
expected. Values of J(t) are given in Table 2 for those partitions of r< 12, where the highest common 
factor of the parts is 2 or more than 2. Table 3 gives the values of Euler’s ¢-function necessary for the 
enumeration of ring-runs up to and including r = 12. 

We may, if we choose, regard the different ring permutations as forming a fundamental probability 
set. The jth moment of the distribution of runs is given by 


(Only those distributions are given which differ from those of Table 1 divided by r.) 
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A, = DIE eid. 


Table 2a. Jablonski runs. (Two-colours.) 






















| 
Runs 
Total r Partition 
2 4 6 8 10 12 

2 4 (2?) 1 1 

3 6 (42) 1 2 

4 6 (3?) 1 2 1 

4 8 (62) 1 3 | 
10 8 (42) 1 5 3 1 
10 9 (63) 1 5 4 

5 10 (82) 1 4 
22 10 (64) 1 8 10 3 
26 10 (5?) 1 8 12 4 1 

6 12 (10, 2) 1 5 
19 12 (93) 1 8 10 
43 12 (84) 1 11 21 10 
80 12 (62) 1 13 34 26 5 1 
































Table 2b. Jablonski runs. (Three-colours.) 










Runs 





Partition 
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Table 2c. Jablonski runs. (Four-colours.) 


























Runs 

Total r Partition —— —_—————-— 

4 5 6 7 8 9 10 11 12 

318 8 (24) | 6 24 72 120 96 

1,896 10 (42%) | 6 36 150 390 §=6633 534 147 
6,940 | 12 (623) | 6 48 236 780 1,698 2,260 1,536 360 16 
17,340 12 (422?) 6 48 276 1,020 2,628 4,524 4,938 3,024 876 
30,804 12 (34) 6 48 296 1,200 3,420 6,704 8,952 7,344 2,834 











Table 3. Values of Euler’s }-function for enumerating permutations for values of r< 12 
4 Y « . 





d l 2 3 4 5 6 | 
(d) 1 1 | 2 2 4 2 





Simple formulae do not flow from this expression but we note that if M, is the mean number of runs when 
repetitions are allowed then 


, M, d—-1 
= J ——= —- a)T 
fa 1, + rd > (=) 9(4) Ta 


It is seen that wy >M, 


with equality if, and only if, the highest common factor of (7,, ...,7),) is unity. 
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Some applications of Meijer-G functions to distribution problems in statistics 


By D. G. KABE 


Karnatak University, Dharwar (India) 


1. Although the Fourier transform is recognized to be a powerful tool in statistical distribution theory, 
the Mellin transform seems to have been neglected. Epstein (1948) has used Mellin transforms to derive 
certain univariate distribution functions and Nair (1939) has indicated their applications to some 
multivariate problems. The Mellin transform of the frequency function f(«), (0<#<0o), of a random 
variable X, is defined to be 


g(s) = I x-"f (x) dz, (1) 
0 


the inverse transform being 


1 io 
f(x) = Al x~* (8) ds. (2) 
271i i 
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Obviously the transform g(s) is the moment y;_, of X ; in particular, if this is given by 


T(b; +s) P(a;+ 1) 








_—— : 3 
en bE inet +1)’ (3) 
then f(x) is given by 
n Y(a;+1) 1 st n I(b;+8) 
(x) = Ses — ——_— 7 dg 
f jl T(o;+ 1) 27 —ioj=1 T(a;+8) 
n too n — 
_ py Liwt of OD ante. (4) 
j=1 1 (0541) 207 J ~j00 j=1 T(a;—8) 


However, the last integral represents the Meijer-G function 


1, Qo, ..-5 
ans(x| | al ; y. 
buh, ...,Bi 


Erdelyi et al. (1953) have studied these functions extensively and it is possible to apply some of their 
results to obtain explicit expressions for f(a) in such cases. 

Now it is known that the moments yj_, of many statistics occurring in multivariate analysis can be 
expressed in terms of I"-functions as in (3). Nair (1939) has considered some of these statistics and proved 
that their frequency function f(2) appears as the solution of the differential equation 


d n d 
eH (#—a+1) se = HT (#5 —b,) s00. (5) 
j=1 j=1 x 
However, it is known that the Meijer-G function satisfies the above differential equation (Erdelyi et al. 
(1953), p. 210, equation 1). Nair proceeds to solve this in the special cases n = 1 and n = 2. In the next 
section we shall consider one of Nair’s examples and obtain his results by making use of known explicit 
expressions for the G functions. 

2. Consider the distribution of the eas correlation ratio U, whose (s — 1)th moment is (Wilks, 


1932, p. 484) TH(n—j) 2 T{p—j-3) +8} 














g(s) = = . (6) 
j=1 TH —J) j= T(J - 3) +8} 
From (4) we have that the frequency function of U is given by 
y ba T'3(n—J) 0 A, Ag.--Ay a 
HO) = Th se ane (U : (7) 
f jl T3(p—J) b,bg...b, 
where a,;=}(n—2-j) and b,=}(p—2-7) (jf = 1,2,...,n). 
We consider the following special values of n. 
(1) n= 1. From Erdelyi et al. (1953, p. 208, equation 5), we find that 
la U% 2 (a,—b,-1 
ax(u ‘yen —— (“ : )(-aru" (8) 
“ Tia, —b,) 25 r ; 
so that from (7) and (8) we have the frequency function of U as 
T3(n—1) : 
U — UP-%(] — U)K"-7)-1_ (0< U< il). 9 
KO) = rap (1-U) ) (9) 
(2) n = 2. In this case we have 
M(n—2 | n—4), Mn—4)+4 
f(U) = ce 1) T3(n 2) gre | k(n — 4), 3(n—4)+ ') (10) 
PH(p— I) TKp—2) 3(p—4),3(p—4) +4 
But, from Erdelyi et al. (1953, p. 209, equation 10) we find that 
U | | 4(n—4), ea = gn-ragie( yi |n— a 
k(p—4),3(p—4)+4 ly 0 —4 
= 2-2-1 /T(n— p) U9 (1 —,/U)"-2-1, (11) 
and using (8), the frequency function of U is 
_2 
f(U) = ek as Uie-0(1— JU)"-?-1 (0< U<). (12) 





2T(p—2) T(n—p) 















Miscellanea 





580 


3. As a further application of Meijer-G functions, let us consider the following problem. Let p, and p, 
be the canonical correlations between two sets of normal variables (X,, X,) and (Y,, Yq,..., Yy),(p> 2), 
and let the variables Q and Z be defined by 


Q=Ppip, Z=(1—pi)(1—p%). 


Now if g and z are the sample values of Q and Z, and f(q,z) their joint frequency function, then from 
a result of Girshik’s (1939) we obtain 





ata = | 7 g**z'-1f(q, z) dq dz (13) 
0 
o T(n—-1) I(p+s—2)T(n—p—3 + 2t) (14) 
~ T(p—-1)T(n—p-1) I(n+s—4+4 2t) 
= g(s,t) (say). 


However, (13) is the double Mellin transform of the function f(q, z); it is known that the inverse transform 
is given by 





1 ico ico 
fae) = ar | [Ean 2—*ds dt (15) 
a T(n—1) 1 io Pe 9 io T(n—p—3—2t) zt 
es rgcne pcm ia i a laa! T(n+s—4—2t) at | as. 


Now the integral within the square brackets in the last expression is the Meijer-G function 


——o 1 1 


10 ( 4 Bet LOS oat 
168 (2 n—p—3 2T(p+s-—1) 


gh(n—p-3) ql an V2) p+s—2. 





From (8) we have then 














my. P(n—1) 4(n—p-3) (] — P— aif reaaa| 2 A 
Ne*)=stp—nrm—p- "Sat J _, Te Law| 

a os i 3(n—p—3) (] — —2 (10 q ga ) 

"iip-DIe-p-" CO Pee p-2 

- 1 T(n—1) zi(n—p-8) gp-2, (16) 





~ 2T(p-1)T(n—p-1) 
a result which agrees with that given by Girshick (1939). 


The author wishes to thank Mr N. U. Prabhu of Karnatak University, Dharwar, for many helpful 
suggestions. 
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A note on ‘Further contributions to multivariate confidence bounds’ 
Biometrika 44 (1957), pp. 399-410 


By S. N. ROY anp R. GNANADESIKAN 
Institute of Statistics, University of North Carolina 


In sections 46 and 4c of the paper, dealing with the general linear hypothesis, starting from (4:15), the 
confidence statement (4-19), and similar statements involving the truncation of columns of M are 
obtained. The truncation procedure described is essentially equivalent to a truncation of variates and 
enables one to start with a u-variate problem and study all the associated (w—1)-variate problems, 
(u—2)-variate problems, and so on until we get down to the wu univariate problems. In this procedure, 
therefore, a total of (2“— 1) confidence statements are obtained. 

There is, however, another type of truncation which is very often of great interest to us and especially 
in the univariate general linear hypothesis problem where we do not have the M matrix. To illustrate this 
problem let us consider the problem of testing the equality of several treatment effects in an ordinary 
ANOVA problem. With v treatments, the null hypothesis may be stated as Hy:t, = t, = ... = ¢t,. If Hyis 
rejected, then a question of some interest usually is what could one say about subsets of the treatment 
effects. For example, what can be said about H,:t, = t, = t,? Similar problems are of interest in the 
multivariate situation also. For example, in the illustrative example of section 5-1, if the six schools 
had turned out to be significantly different, i.e. Hy:§, = & = ... = & is rejected, then this is possibly 
caused by one or two of the six schools being different from the rest. So, for instance, we might be 
interested in studying departures from null hypotheses like Hy:&, = &, = &;. 

This second type of problem can easily be seen to be a problem of truncating the rows of the C matrix. 
This can be done by writing b* = Ob in (4-15), so that we have for our starting point, instead of (4-15), 
the statement 


a*’X*’A,(AjA,)-! Cj b* — (a*’S*a*)! [sc,(u,s,n—r)]8 
<a*’*’b* <a"’X"'A,(A,A,)“ 01 b* + (a*’S*a)! [sc,(u,8,n—r)]}8 


for all non-null a*’s and all non-null b*’s satisfying b*’[C,(A;A,)-! Ci] b* = 1. Taking suitable elements 
of b* equal to zero, we can now obtain the desired truncations on the rows of the C matrix. Thus we 
have, in fact, not merely the (2“— 1) statements derived in the paper, but (2“— 1) x (2*— 1) confidence 
statements which include the original (2“— 1) statements and also others obtained by truncating the 
rows of the C matrix. The joint confidence coefficient is, of course, >(1—c.), for a preassigned a. 


Selection of the population with the largest mean when comparisons 
can be made only in pairs 


By RITA J. MAURICE 
University College London 


1. InTRODUCTION 


It has been pointed out by Bechhofer (1954, 1957) that when normal populations of equal variance are to 
be ranked on the basis of their means, experimental designs which eliminate the effects of heterogeneity 
and so reduce the underlying variance of the experiment serve the same function as in the ordinary 
analysis of variance. Thus, if comparisons can be made only in pairs the appropriate standard experi- 
mental design is an incomplete block design with two plots per block. If there are k populations the 
simplest balanced design requires $k(k—1) blocks, so that each population appears once in the same 
block with each of the other k—1 populations. However, if k were large this would require a large 
number of blocks and some balanced design involving less than k—1 units of each treatment might be 
employed. The block effects may be constants which add to zero over one replication of the experiment 
or random effects which the analysis assumes normally distributed. 

For the simplest ranking into two groups (the ‘best’ and the ‘other’ populations) an alternative 
procedure of successive elimination, following the procedure familiar in cup-ties and tournaments, might 
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be used if k and the number of ‘ best’ populations were equal to a power of two. Initially the k populations 
would be paired at random and a comparison made within blocks. The $k populations with the larger 
means would then be paired again at random and similar comparisons made resulting in }k populations 
to be further compared. This procedure would be continued until the number of populations remaining 
was equal to the number to be selected. If only one population is to be selected one replication of this 
procedure requires k— 1 blocks. 

The simplest cases for which these two methods can be compared is for k = 4 and the selection of one 
of these as the best. For this case one replication of the fully balanced design requires only six blocks and 
it seems reasonable to assume that it would be used. For the cup-tie procedure, a multiple of three blocks 
would be required. 

If it is desired to detect a difference 5 between the largest and second largest means with probability at 
least P, it must be assumed that the third and fourth means are equal to the second largest (Bechhofer 
(1954); Somerville (1954)). The most unfavourable configuration of the population means is therefore 
6,—6 = 0, = 0, = 9, where 0, is the largest mean. 


2. FouR POPULATIONS: BLOCK EFFECTS CONSTANT 


A fully balanced incomplete block design for four populations requires six blocks for each replication, 
each population being tested three times in one replication. If the effects of a block on the result are 
constant the model is of the form 


ty =O,4+By+2%, (i= 1,2,3,4;7 = 1,2,..., 6), 


6 
where > B, = 9, E(z) = 0, E(z*) = o?. For this model the estimate of 0; is given by 
j=1 
i= 4 Dtu+7¢ 2 Ray 3h (ey tae) +h Lr ms 
J ji ™m 


(Cochran & Cox (1950), p. 264). 7’ here indicates the population tested in the same block as population 7;. 
The difference between two population means, 0; — 6, is therefore estimated by 


t;,-t=4}3 {2 %5— Drs} + Fa{2 Day — 2 Ley 3> (x45 + 245) + 3D (Xj + %y,)}, 
j j j j j 
which reduces to t;-t, = 4 Dd (ay — 23) -—4 } (ty — Xp) 
j j 


and is therefore based entirely upon within block comparisons. 

The variance of this estimate of the difference between two means is a”, and the covariance of two such 
estimates t; —t,,, t;—t,is }0*. If there are n replications of the experiment the variance becomes o?/n. Thus 
9 = (t;—t,)./n/o has a variance-covariance matrix. 


()3 4) 
. : a 
+41 


The probability of a correct choice of the population with the largest mean is given by 
Pr {t, > max ¢g, ts, 4} = Pr {t,-t,>0(1 = 2,3, 4)} = F(d./n/o,8/n/o,dJn/o), 


where F(x, x, x) is the incomplete normal trivariate integral with the variance-covariance matrix given 
above. 

In the tables given by Bechhofer (1954) this corresponds to the case k = 4, ¢ = 1 and JNA = 2E(g;). 
These tables give ,/2x for a number of values of P = F(a, x, x). Squaring the tabulated value and dividing 
by two gives the number of replications, n, in units of ¢?/5?. The cost in units of this multiple of the assumed 
cost per plot, c, is then twelve times this figure. Results are given in Table 1. 

If a cup-tie procedure is used each replication comprises three blocks instead of six. The probability of 
detecting a given difference between the best and the other three means is now given by {F(d/n/(./20))}’, 
where F(x) is the incomplete normal integral and n is the number of blocks in which each comparison is 
made. Taking the same probabilities of a correct decision as given in Table 1, from the square roots 
a’ = 6,/n/(./20) may be obtained using tables of the normal integral. 2a’? is equal to né?/o? and six times 
this gives the cost (in units of co*/d*) of attaining these probabilities. Results are given in Table 2. 
Comparison shows that in every case the cost of the cup-tie procedure is smaller. 
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Table 1. Number of replications and cost of experiments. (i) Incomplete blocks procedure 











a i mak eae 
F (2, x, x) =P | no? /o (Cost) 6?/a%c 
| 
| | | 
0-99 7-209 | 86-51 
0-95 | 4-252 51-02 
0-90 | 3-005 36-06 
0-80 | 1-792 21-50 
0-70 1-115 13-38 
0-60 | 0-665 7-98 
0-50 | 0-350 4-20 
0:40 0-136 1-63 
0-30 | 0-017 0-20 | 
0-25 | 0-000 0-00 | 
| | 











Table 2. Number of replications and cost of experiments. (ii) Cup-tie procedure 
| 








| F(z’) = P | né?/o? (Cost) 6?/a%¢ | 
| | | 
| 0:99 | 13-261 79:56 | 
0-95 7-640 45-84 
0-90 | 5-328 | 31-97 | 
| 0-80 3-127 18-76 
0-70 1-924 11-54 | 
0-60 1-137 6-82 
0-50 0-594 3-56 
0-40 0-229 1:37 
0-30 0-029 0-17 
0:25 0-000 0-00 | 








To make a general comparison of the performance of the two methods involves comparison of the 
probabilities of a correct choice when the same amount of money is spent on sampling in each case. This 
requires that if n, is the number of replications of the balanced blocks, n, (the number of replications for 
the cup-tie) equals twice n,. The probabilities of a correct decision then become F(x, x, x) and {F(a)}*. If 
the cup-tie method is always cheaper then 


F(x, x,x)<{F(x)}? for positive x. 


Or, writing both sides in an alternative form (Paulson (1952)), E{F%(t+./(2) a)} <[H{F(t+./(2) x)}]*, 
where ¢ is a unit normal variate. In neither form has the inequality been proved to hold. This seems to 
indicate that, if the inequality does hold for all positive x, the difference between the two probabilities is 
small for at least some values of a. 

A further comparison of the two procedures may be made by using Wald’s minimax procedure (1950) 
instead of P and 6, to determine the number of replications in each case. Assuming that the loss involved 
in making an incorrect choice is proportional to the difference between the chosen and the largest mean, 
the loss function may be written 


4 
E(loss) = N ¥ \0,—9;) p;+ 2bne. 
i=2 


N here represents the scale of use of the chosen population and 9, is the chance of choosing population 7,. 
It isalso assumed, as above, that the cost of sampling, c, is the same for each plot in all blocks, and that b 
is the number of blocks in one replication. The expected loss is a maximum when the means take their 
most unfavourable arrangement (Somerville, 1954) and its expression then simplifies to 


E(loss) = NO(1—/p,) + 2bne. 
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For the incomplete blocks procedure the expected loss is 
NO — FO Vn/o,0./n/o, 0./n/o)} + 12ne. 


When @ takes its most unfavourable value this function is at a maximum with respect to 0. This occurs at 
6 = (0-878) a/,/n. The expected loss for this value of 0 is equal to 


No(0-3274)/,/n + 12ne 


(Somerville, 1954). Minimizing this maximum with respect ton givesn = Niotc—4(0-0571). The maximum 
expected loss for this value of n is equal to Nto#ct(2-055) and is divided in the ratio 2:1 between the 
costs of a wrong decision and the cost of sampling. When there is no restriction on the comparison of the 
populations the maximum expected loss for the minimax n is N#otc4( 1-795). 

The loss function for the cup-tie procedure is 





E(loss) = of1 -/ TA) + onc 
v 


This function is at a maximum with respect to 0 when 6,/n/(V2o0) = 0-8178 and is then equal to 
N \(2) o(0°3032)/,/n + 6nc. Minimizing this maximum with respect to n gives n = Niotc-i(0-1085). The 
maximum expected loss for this value of n is again divided in the ratio 2:1 between the cost of wrong 
decisions and the cost of sampling and is equal to Ntcfct( 1-953). The maximum expected loss is reduced 
by Nicict(0-103) as compared with the maximum expected loss for the incomplete blocks procedure, but 
is Ntoct(0-158) greater than when there are no experimental restrictions. 

A comparison of the expected loss of the two procedures is given in Table 3 assuming the minimax » is 
used in each case. The values of 9Nta—%c-t were chosen so as to make possible the use of Table 3-1 given 
by Somerville (1954) for the probability of a correct choice using the incomplete blocks procedure. The 
table again shows that the cup-tie procedure is more economical than the balanced incomplete blocks 
procedure for these values of #. A comparison with the unrestricted minimax procedure is made in Fig. 1. 


Table 3. Eapected loss of minimax experimental procedures in units of Ntotct 





| 
Balanced Cup-ti Balanced Cup-ti 
ONto-tc-t | incomplete oe ONto-ic-i | incomplete 7 unde #0 
| blocks procedure poacedure blocks procedure | Perry 
0-0 0-69 0-65 5-02 1-92 1-79 
0-42 | 0-98 0-95 5-44 1-84 1-71 
0-84 1-25 1-21 5-86 1-75 1-62 
1-26 1-48 1-43 6-28 1-65 1-52 
1-67 1-67 1-61 6-70 1-55 1-42 
2-09 1-82 1-76 711 1-44 1-33 
2-51 1-93 1-86 7-53 | 1-35 1-24 
2-93 2-01 1-92 7-95 1-25 1-15 
3°35 2-05 1-95 8-37 1-17 1-07 
3°77 2-05 1-95 10-46 0-86 0-81 
4-19 2-04 1-92 12-56 | 0-73 0-69 
4-60 1-99 1-86 
| 





3. FOUR POPULATIONS: BLOCK EFFECT A RANDOM VARIABLE 


Instead of assuming the block effects constant, it may be more realistic and make possible conclusions of 
wider validity, if the block effects are assumed to vary and to be distributed normally with variance a 
independently of the residual values. The model then becomes 


ty = O, +b, +245, 








(Ce 








Miscellanea 














ar 
sat 
2:0 
cle 
Vv 
um “S 
the = 
‘the Oo 15 
= 
=) 
£ 
2 4-0 
vo 
vo 
S 
1 to 8. 
The “ 
ong O05 
iced 
but 
nN is 0 | ! | we 1 1 ! ! = 1 i 
iven 1 2 3 4 5 6 7 8 9 10 11 
The Ntotc 36 
ms : Fig. 1. Expected loss of restricted and unrestricted minimax procedures. (1) Balanced incomplete 
ae blocks procedure. (2) Cup-tie procedure. (3) Unrestricted procedure. 
where E(b) = 0, E(b®) = 0%, E(z) = 0 and E(z?) = o?. Assuming that o} also is known, the estimate of 0; 
— for this model is given by 
6 
O25 +05 Dy (%i5— 2s) + $0 DY Dns 
= ee a es ‘ _j=1m 
Y ; 302+ 407 
(Cochran & Cox (1950), p. 266) and the estimate of 0;— 6, by 
) : 
0? > x4; — 07? Dry t 03D (Xy5— Xy;)— 04 D (ty — 25) 
ea: j ae le 
io 302+ 40? 
The variance of the difference is 20°(20} + o?)/(30% +405) = 0”? and the correlation between (¢;—¢,) and 
(t;—t,) is $. 
Writing o”? for o?, the results of the previous section may be applied. o”? increases as 0} increases from 
} a lower limit of 202/3 (when 0} = 0) toan upper limit of o? (when oj = 00). In thissituativa the incomplete 
blocks procedure achieves a given probability of a correct choice more cheaply than if the block effects 
are constant. The analysis of the cup-tie procedure is not affected by the change in the model, since 
comparisons are made only within blocks and not between blocks. If a is sufficiently small the incomplete 
blocks procedure will be cheaper than the cup-tie procedure. Comparing the costs tabulated in Tables 1 
and 2, 0? < 0-840? (o} < 0-810?) is sufficient for this to hold for all these values. 
Using the incomplete blocks procedure the minimax value of n for this model, assuming the same loss 
function as in the previous section is N#o’tc—4(0-0571) and the maximum expected loss for this value of 
n equal to Nto’tct(2-055). For the cup-tie procedure the maximum expected loss for the minimax n is 
Ntcict( 1-953) as before. This is less than that for the incomplete blocks procedure only if 
a’ 8(2-055) > o#(1-953), 
onsof | which reduces to the condition o> 1-0060°. 
nee oF 


Values for the expected loss of the two procedures are given in Table 4, assuming the minimax n is used. 
Values are given for o/c? equal to 0-5, 1, 1-5, 2, 3. As o} tends to infinity, the expected loss tends to the 
values given in Table 3 for the incomplete blocks procedure. 
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Table 4. Eapected loss of minimax procedures in units of Ntotct 





























= | ¢? = 0-5002 o?= 0? o?=1-5002 | of = 20° o? = 30? | peat 
| | | 
| | | ee aaa ae 
0 | 0-64 065 | 066 | 0-66 0-67 | 0-65 
l | 1:29 | 1:30 1-31 >: i 1-32 | 1-29 
2 1-71 ss | Mm |. 4a 4 176 | 1-72 
3 1-89 1-93 1-95 1-96 1-98 1-93 
4 1-88 1-93 1-96 1-98 2-00 1-94 
5 1-72 1-78 1-82 1-83 1-86 1-79 
6 1-49 155 | 1-59 1-62 1-64 1-59 
7 1-25 1-31 1-35 1-37 1-39 1-35 
8 1-05 1-10 1-12 1-15 1-17 1-14 
9 0-88 0-93 0-94 0-97 0-99 0-96 
| 10 0-77 0-82 0-83 0-84 0-86 0-84 











Note. The second decimal may be one or two points in error as the figures were obtained by graphical 
interpolation. 


Table 4 shows that the comparison made for the maximum holds fairly well over the entire range of 
ONto-ic-t. If o} = 0-5 0% the expected loss is less for the balanced blocks procedure than for the cup-tie. 
When o} = o? the losses are very nearly equal for 0N*/c%ct less than five, but the balanced incomplete 
blocks procedure is more economical for larger values of @. For the larger values of 03/0? the cup-tie 
procedure is a definite improvement. 

Thus the comparison is in favour of using the standard experimental design only when 07/0? is less than 
one or one and a half. In practice this ratio will often be unknown and will have to be estimated. 
Unbiased estimates of 6; and 6;—6, can still be obtained by using weights estimated from the data 
(Kempthorne, 1953). However, the variances of the estimates will be increased by the use of estimated 
weights and more replicetions will be needed to ensure a given probability of a correct selection. There- 
fore, if o}/o? is unknown the advantage will probably still lie with the cup-tie procedure. 


The author is indebted to Dr N. L. Johnson for suggesting consideration of the cup-tie procedure and 
for his comments during the course of the work. 
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CORRIGENDA 


(1) Biometrika (1957), 44, pp. 532-3 


‘A note on the mean deviation of the binomial distribution.’ 
By N. L. Jounson 


Since the publication of this note I have received the following information on earlier 
work on the subject, of which I was unaware. 

Prof. O. Reiersol refers to p. 161 of an article by R. Frisch—‘Solution d’un probléme du 
calcul des probabilités’ in Skand. Aktuar. Tidskr.7 (1924), giving a derivation of formula (2) 
of my note. 

Dr L. A. Aroian refers to p. 85 of C. Jordan’s Statistique Mathematique (1927) which 
quotes Frisch’s result. 

Dr E. L. Crow refers to a paper by J. S. Frame—‘Mean deviation of the binomial dis- 
tribution’, Amer. Math. Mon. 52 (1945), which gives an empirical deduction of formula (2) 
followed by an approximate formula for the mean deviation, analogous to that implied by 
my formula (4). 

N. L. Jonnson 


(2) Biometrika (1953), 40, pp. 116-27 


“On the mean successive difference and its ratio to the root mean square’ 


By A. R. Kamat 
P. 117, line 16, equation (4). 
8 


Read &(d?) = = o® for &(d3) =—o%. 
nm nT 


(3) Biometrika (1958), 45, pp. 211-21 


‘Moments of sample moments of censored samples from a normal population.’ 


By J. G. Saw 


I regret that a number of mistakes have occurred in Table 4, giving values of H;(p,, a,b), 
in the above paper. 

On using this table for further work, it became apparent that errors must exist: when 
p, was 0-70, 0-75 and 0-80. On checking it was found that, for example, in the computation 
of (d?), (equation (4°5)) an error had occurred for p, = 0-70; t = 2. Since (d),, (ed?),, (d*),, 
(cd3), and (c?d?), were obtained using (d?),, this led to an unfortunately large number of 
mistakes in the p, = 0-70 group of Table 4. The errors for p, = 0-75 and p, = 0-80 were 
similar but since they occurred higher up the a + b scale, the resulting ‘triangular’ spreading 
of errors was less. 

Considerable checking has been made in the tables and it is hoped that no serious errors 
remain. 

































Corrigenda 
Corrected values of H;,( p,,a, 6) in Table 4 
nee teeta ate ers a : hicks 
i=2 i=3 i=4 i=5 
siti tee ES ce be Up eee ae eee ea T 
| | | 
| a=2; b=0. p,=0-70 + 0-52227 171 +0-61553 24 | + 0-869914 + 1-08789 | 
0-75 + 0°77579 5 T 
A 
a=3; b=0. p,=0-70 + 1-25553 33 +2:101579 | + 2-96766 + 37485 gn 
0-80 + 0-68542 12 | je 
| in 
| a=2;b=1. p,=0-70 | -—1-0714532 | -—2-008197 | — 2-90650 | — 3-0618 | le 
| . | st 
| a=0; b=3. p,=0-75 | + 4932648 o1 
a=4;b=0. p,=0-60 | | 427-6435 | 3 
0-70 | +42-36619 97 +5-069063 + 9-16907 + 14-5267 ir 
0-75 | | +3:550811 | b 
0-80 | +1-1631592 | | | il 
| | | | 1 
a=3; b=1. p,=0-70 —1-90213 17 ~4-41715 0 — 826534 —14-0127 | fe 
0-80 | ~—0-9371302 | 0 
| 0 
a=2; b=2. p,=0-70 + 1-50575 16 + 3-84805 7 + 699646 413-4659 fi 
| a=1; b=3. p,=0-75 | | | + 087939 ti 
| | | ry 
| a=0; b=4. p,=0-75 | + 257-89569 g 
| 
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REVIEWS 


The Mathematical Theory of Epidemics. By N. T. J. Battey. London: Charles 
Griffin and Co. 1957. Pp. 194. 36s. 


This book gives an up-to-date account of the basic mathematical theory underlying epidemiology. 
After a brief historical sketch the book discusses first the deterministic approach to epidemiology which 
grew up between the two wars. This approach owes much to the pioneering series of papers published 
jointly, between 1927 and 1939, by W. O. Kermack and A. G. McKendrick. The next, and the most 
important, portion of the book concerns the introduction of probability theory into the basic models 
leading to stochastic theories of epidemics. This portion of the book must be studied carefully by all 
students of the subject. Much of the work here is of very recent origin and the author himself has been 
one of the main contributors to the field. 

In Chapter 5 a simple stochastic epidemic, where there is infection only, is constructed and analysed 
in some detail. Next a more general stochastic epidemic where there is both infection and removal of 
infectives by death or isolation is considered. The formulae concerned are naturally more complicated 
but are derived with a fair amount of detail. Chapter 6 deals mainly with chain-binomial models 
illustrating the methods with the formulae for households containing three, four or five individuals. 
Two alternative types of model are considered here; the Reed—Frost formulation and Greenwood’s 
formulation. The latter is slightly simpler to handle mathematically, since it assumes that the chance 
of infection is independent of the number of infectives available to transmit the disease. The final part 
of this chapter considers the situation where the chance of infection does not remain constant but varies 
from individual to individual. 

Chapter 7 attempts to show how the earlier techniques can be bettered by using the observed varia- 
tions in the time intervals between successive cases. Chapter 8 deals with recurrent epidemics and 
Chapter 9 with the detection of infectiousness. At the end of the book there is a good bibliography that 
gives most of the mathematical papers that have appeared on the subject up to and including 1957. 

This book is written in the clear style that we have come to expect from Mr Bailey, and he has 
presented the whole subject in an extremely logical and orderly way, pointing the direction for further 
work. He also makes an effective plea for more practical data to be collected in a form suitable for 
mathematical analysis. The standard of mathematics required for this bcok is high and it is not a book 
for the non-mathematical practitioner of medicine who would require a rather more practical form of 
approach. As a first venture into a new field the book is, however, excellent and is a very welcome 


addition to the range of statistical texts now available. P. G. MOORE 


Variation and Heredity. By H. Katmus. London: Routledge and Kegan Paul Ltd. 
1958. Pp. 227. 28s. 


Dr Kalmus’s book is concerned with the causes of human variability and is the third of a new series 
(Survey of Human Biology) which it is hoped will be useful to several groups of readers, namely, persons 
interested generally in human affairs, specialists in medical or social sciences, and students who need 
documented guides in subjects related to their own. The book seems admirably suited to these aims, 
being scholarly (with a bibliography of 171 well-chosen references) and full of information, but also 
very lucidly and pleasantly written. 

Various ways in which human variability is manifested and may be measured are described in general 
terms without details of statistical method. Dr Kalmus carefully discusses the nature-nurture problem 
with reference to the pitfalls into which investigators can and do fall in disentangling these factors. 
He provides a very clear account of the chromosomal basis of inheritance (Mendelism) as evidenced 
in Man (again with some mention of statistical pitfalls in interpretation of human data). Here, as 
throughout the book, the general, and even sometimes the specialist reader, will find a great deal of 
interesting information. There are some very nice plates and diagrams, and the printing and production 
are good. Dr Kalmus also deels with the limitations on the present effectiveness of Mendelian theory 
as in extrachromosomal inheritance, developmental mechanics and quantitative inheritance, thus 
giving a fair picture of the still evolving state of genetical knowledge. 

















590 Reviews 

Besides fundamentals the book introduces us to several topics of great general importance such as 
radiation dangers, AID, Eugenics, intelligence and mental defect. In excellent chapters on geographical 
variation and the genetical theory of populations, Dr Kalmus demonstrates the unscientific character 
of such a term as ‘race’, and explains the kind of genetic concepts which ought to form the context 
of realistic thinking about so-called ‘racial differences’. The price of unrealistic and unscientific thinking 
has been a heavy one for the world. There is a price to pay for unscientific thinking in other human 
affairs also, and a book like Dr Kalmus’s, though it will probably be read mostly by people already 
rather highly educated, still has a part to play, because even the university graduate is far from being 
well informed as to the full complexity of human beings, and all of us nourish some prejudices or 


misconceptions acquired in our childish years. A. B. G. OWEN 


Sources and Nature of the Statistics of the United Kingdom. Vol. 1. Edited by 
M. G. Kenpati. Edinburgh: Published for the Royal Statistical Society by Oliver 
and Boyd. 1957. Pp. 343. 30s. 


The second volume of papers on statistical sources in the United Kingdom, like the first, consists of 
reprints of articles by different authors on particular topics which had previously appeared in the 
Journal of the Royal Statistical Society. The original articles had been written at various times over 
a period of several years and were thus by the date of publication of the collection to varying extents 
out of date. The papers have, however, been brought more up to date by revision or, in many cases, 
by the addition of a short appendix on the more recent developments. The result is an invaluable 
collection of detailed discussions of the source material on a wide range of different subjects. 

The papers in this volume are, once again, divided into four groups—general surveys, commodities, 
transport and communication, and miscellaneous. Unlike the first volume, where more than half the 
articles were on particular commodities, the largest group here is made up ef the general surveys. 
The papers in this group in the first volume, on censuses of production and distribution, oversea trade, 
agriculture, and labour statistics, were, however, on subjects which are important in a discussion of 
the general economic situation, whereas the papers in this volume tend to be on subjects with a more 
specialized interest. 

The authors of the papers are drawn partly from the civil service and the universities and partly 
from specialists on their particular topics in business and industry. Naturally the nineteen different 
authors use widely differing approaches to their subjects, but it is interesting to note that the authors 
from the civil service and the universities, with one or two exceptions, discuss the available sources 
with little or no reference to actual numerical data, whereas the other specialists, in general, use 
considerable statistical data to illustrate their discussion. An excellent example of the former approach 
is the article on food statistics by W. D. Stedman Jones; this discusses in considerable detail the methods 
of collection of food statistics and their meaning. While many of the methods discussed are no longer 
used because of the ending of the controls on which they were based, as is noted in the addition at the 
end of the article, the article will remain of very great value to those wishing to study food statistics for 
a very interesting period. The article on criminal statistics by Tom S. Lodge is one of the exceptions 
in that, although by a civil servant, it contains several tables. Even here, however, the tables are 
clearly included, not for the intrinsic importance of the information they give, but as illustrations of 
the points which Mr Lodge wishes to make on the interpretation of the statistics. The great value of 
this article, once again, is the information it gives on the methods of collection of the statistics, on 
their meaning, and on the consequent difficulties in interpretation. 

In some of the other articles much greater use is made of the statistical data in themselves—as, for 
instance, in the estimates of net saving through life assurance in 1949 in the article on the statistics 
of British insurance by A. George Herbert and Roland D. Clarke. In one or two places elsewhere in 
the book one is left with the uncomfortable suspicion that the data are being used to prove a point 
favourable to the author’s viewpoint, rather than to inform the reader on the sources available, but 
such examples are commendably rare. 

The methods of classification of their subject-matter by a few of the authors leads to repetition of 
the description of some sources, This may improve the articles as works of reference, but it does not 
add to one’s pleasure when reading them as a whole. It is, in any case, difficult to remain interesting 
when writing on the sources of statistical data and it is to be feared that some of the authors have not 
succeeded. 
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One inevitable disadvantage in producing a book of this nature is that, while the costs of producing 
it are high and the potential market must be small, some of the articles contained in it are likely to 
become out of date very soon. It is to be hoped that the Royal Statistical Society and the publishers 
will be able to continue their good work by bringing out revised editions of the two volumes at intervals 
which are not too great. 


Reviews 


W. J. CORLETT 


Petrographic Model Analysis—An Elementary Statistical Appraisal. By Frrrx 
Cuayes. New York: John Wiley and Sons Inc.; London: Chapman and Hall Ltd. 
1956. Pp. xii+113. 44s. 


It is always interesting to a statistician to read of the application of his methods in a specialized field 
and this book is no exception. And while, apart from the first chapter, where some interesting points 
of geometrical probability are raised, the statistical content of the book is largely restricted to means 
and standard errors, the associated sampling problems, being in some respects special to petrography, 
make instructive reading. 

In the first instance the book treats of rocks, such as granite, which are composed of granules of 
different minerals, the percentage composition of which is known as the mode. A thin section of rock 
being taken, much like a histological section it would seem, the mode is determined either by means 
of a continuous line integrator along the lines of a rectangular grid, or by a point count made at the 
points of intersection of the lines of the grid. The first chapter deals with rocks in which the granules 
are randomly dispersed and establishes formally the proposition that the line integrals give unbiased 
estimates of the volumetric composition. The next treats banded rocks and the subsequent chapters 
are concerned with reproducibility, standard errors and other more technical questions. 


D. E. BARTON 


Non-parametric Methods in Statistics. By D. A. S. Fraser. New York: John Wiley 
and Sons, Inc.; London: Chapman and Hall, Ltd. 1957. Pp. x+ 299. 68s. 


This book is likely to be of considerable value to theoretical statisticians who are familiar with the 
advanced textbook by Cramér or that by Kendall or, preferably, with both. (The author’s intention that 
it shall serve as a direct sequel to Hoel’s intermediate book seems to me almost ludicrously optimistic, 
at least as far as British students are concerned.) Like Cramér’s book, it contains its own introduction 
to the measure-theoretic methods it uses, but this is much too compressed for the newcomer to measure 
theory. The first two chapters (124 pages) are a fairly comprehensive introduction to the abstract ideas 
of present-day theoretical statistics, not confined to non-;arametric problems. Chapter 3 is a ten-page 
introduction to the latter. Chapter 4 supplements Chapter 2 in the treatment of the problems of 
estimation and of tolerance regions, but there is no such supplement for the confidence region problem, 
which gets only a cursory six-page treatment in Chapter 2. Chapters 5—7 are the core of the book, 
dealing with hypothesis testing, limiting distributions and large-sample properties of tests, all heavily 
oriented towards non-parametric procedures. 

Despite this orientation, the book by no means exhausts discussion of non-parametric problems: 
goodness-of-fit problems are not properly discussed, and although the distribution of coverages is 
derived for tolerance interval purposes, it is not used to set the closely related confidence intervals 
for percentiles. 

In fact, the book is directed to the abstract theory, and will be of little value to the user of statistical 
methods, more particularly as the mathematical style is what may be called Annals-rebarbative: how 
many statisticians who know them will unhesitatingly supply the three missing words in the following 
characteristic definition from page 17? 

‘A statistic y=t(x) is...for the family of probability measures {P,|0€Q} over 2 (A) if there exists 
a function P(A|t) such that 
P,(A nt-(B))= | P(A|t)dP3(t) 

B 


for all Ac.o, Be B; that is, if there exists a determination of the conditional distribution, given ¢(z), 
which is independent of 0. P7 is the measure for ¢(x) induced from the measure Pg over 2.’ 
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This is heavy weather. The corresponding definition in Rao’s well-known book is, with the same 
three words omitted, 
‘The necessary and sufficient condition that a distribution admits...is that the probability density 
can be written in the form 
. P=O(T, 0), (ay -- +4 Xy)s 


where (7', @) is the density of the statistic T’ and ¢, (%, ..., z,), the density of the sample given 7, 
is independent of 0.’ 

A few minor cautions are necessary to the student using the book. The statement on pages 2-3 
needs to be remembered in the statement of some of the theorems later in the text. The statement, on 
page 144, that the k-statistics are minimum-variance unbiased estimators of the cumulants, and on 
page 145, that the sample mean difference is the minimum-variance unbiased estimator of the popula- 
tion mean difference, are ‘non-parametric’ results in the sense that they do not hold if particular dis- 
tributional forms are considered—e.g. the mean of the rectangular distribution and the mean difference 
of the normal distribution. 

There is a fair number of trivial misprints which will cause no difficulty, and a very large number 
of problems for solution, many of them integrated with the text. The index is inadequate for a book 
likely to be of value as a reference source: authors are not indexed, although theorems are labelled 


’ 
by authors’ names. A. STUART 


Wahrscheinlichkeitstheorie (Band LX X XVI of Die Grandlehren von Mathematischen 
Wissenschaften). By H. Ricutrer. Berlin: Springer-Verlag. 1956. Pp. 435. DM. 66. 


This book is concerned to develop the theory as much as the calculus of probabilities and to do this 
abstractly as a mathematical discipline based on measure theory with an axiometric basis similar to 
Kolmogoroff’s. About one-third of the book is concerned with set theory and derived integration 
theory. Another third deals with the concept of probability as an induction from experience and the 
consequences of this in regard to the axiomatics, together with the general development of the set of 
axioms defined here. The rest of the book develops what may be distinguished as the calculus of 
probabilities; transformation of variables, moments and characteristic functions, limit theorems, 
standard distributions, central limit theorems, etc. As a consequence of the heavy development of the 
earlier chapters these are dealt with very thoroughly, but this much restricts the range of problems 
covered. 

The author says that the book is designed for mathematics students and that, in effect, it will also 
act as a textbook for set and integration theory. However, while he is at pains to point out the inclusion 
of the more easily comprehended classical elementary theory in his general development, it may be 
wondered whether the student who has read it will have a more flexible and understanding grasp of 
probability theory than one who has read a more down-to-earth development of half the length. 


D. E. BARTON 


Statistica. By Francesco BramBitiaA. Milan: La Goliardica. Vol. 1. La variabilita 
strutturale. 1955. Pp. 672. Vol. 1. La teoria della stima. 1956. Pp. 688. L. 9000. 


These are textbooks of modern statistical mathematics and techniques written by an Italian for his 
fellow-countrymen. The books are excellent and show evidence of wide reading in all the available 
literature. The standard of exposition is high. On the whole, English students of statistics will not 
learn more from these books than they can already get from a combination of Yule & Kendall’s 
Introduction and Kendall’s Advanced Theory. There is, however, a certain freshness about well-known 
theory when it is presented in a foreign language and for those wishing to practise their Italian this 
will be an excellent gift. 

Volumes 1 and 1 are not sold separately. They are printed by some kind of off-set process which 
makes the price seem rather startling. A third volume La Teoria della Inferenza is promised. 


F. N. DAVID 
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Probability, An Intermediate Textbook. By M. T. L. BizLey. Cambridge: Published 
for the Institute and Faculty of Actuaries at the University Press. 1957. Pp. viii+ 
230. £1. 


This book gives an elementary treatment of the simpler problems of classical probability theory. The 
logical sequence of the development of the subject necessarily follows, initially, the well-worn path: 
rules of probability, elementary combinatorial problems, binomial and hypergeometric distributions. 
These are followed by Bayes’s theorem and what may be called the Generalized Addition Law for 
probabilities of independent events. This is rather oddly attributed to Waring (1792), though it was 
clearly stated by De Moivre (1724) and in fact, for the case of three events, by Halley (1693) who gave 
the geometrical picture of the theorem beloved of modern set theorists. Next follow chapters on 
expectations, on problems which may yield their solution as a difference equation and on simple runs 
of two alternatives. Finally, there is a chapter which introduces probability density functions by 
means of the intuitively plausible concepts of geometrical probability. 

The book is apparently a course-book designed to cover a specific syllabus and this vitiates its appeal 
to the general reader, since the topics covered are selective rather than encyclopaedic. On the other 
hand, it faces the limited mathematical attainments of its intended readers with ingenuity and the 
problems are, considering this limitation, surprisingly representative. Sometimes the ingenuity is 
restrictive however; for instance, it would be hard to derive the normal distribution plausibly or 
instructively from a situation couched in terms of classical geometrical probabilities. 

A more serious criticism of the logical development of the book arises when the first chapter, on the 
nature of probability, and the appendix, on theories of probability, are considered. The author’s own 
opinions are not clear: he would seem to subscribe to the ‘principle of insufficient reason’, which is 
hard to distinguish from the most subjective of degrees-of-belief theories, and to the views of Perks 
which many will consider to introduce an unnecessary ambiguity into the words ‘equally likely’ with 
very little gained. Apart from these points he gives an ear more or less impartially to the different 
schools of thought without having adequate space to deal with the contradictions between them. It 
may be doubted whether such treatment will help the student to a clear idea of what he is doing when 
he uses probability methods even if he understands, in a rather potted form, something of the different 
theories. There is in any event a danger that the student wil! accept such highly abridged versions as 
the last word, to the permanent detriment of his understanding. Further, it is particularly surprising 
that, in a book for actuarial students where such topics are broached, more emphasis is not laid on 
frequency theories. It would appear to a non-actuary, such as the reviewer, that the insurance 
companies, if no one else, would stand or fall according as to the frequencies with which the various 
contingencies of life occur and to the degree to which we may predict future frequencies from past 


records. D. E. BARTON 


Statistika Metoder. Vols. 1 and un. By H. Hyrenivus. Goteborg, Sweden: Gumperts. 
1957. Pp. 625. Swedish Kr. 42 (£2. 18s.) 


This is a very elementary book about statistics for non-statisticians knowing hardly any mathematics. 
It is in three parts (previously issued as stencilled volumes) the first two being contained in Vol. 1 of 
the present binding. 

The first deals with the processing of data: tabulation, punched cards, graphical representation, the 
computation of means, standard deviations, y? (for contingency tables), correlation and regression 
coefficients, curvilinear and partial regression and correlation coefficients. There is a wealth of worked 
examples from medical, biological, sociological and economic fields. This part ends with an uncommonly 
sensible commonsense treatment of economic time series. 

The second part outlines the theory underlying the first, the results being for the most part stated 
without proof, but the concepts and definitions being covered by ample, if rough, discussion and 
illustration. In this way a very large amount of ground is covered: elementary probability theory ; 
the binomial, Poisson, multinomial, and uni- and bi-variate normal distributions; graduation by 
Pearson and Gram-Charlier curves; maximum likelihood estimation, confidence intervals, significance 
testing and power curves, the t- and F-tests and their distributions; sampling variances and expectation 
theory, all being treated. The third part covers in more detail particular testing situations, namely, 
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those of contingency, goodness-of-fit, regression, simple analysis of variance, curvilinear regression and 
ranking. 

It is an excellent book of its kind and one feels not only that the medical man or biologist whe follows 
it through will have a wide variety of elementary statistical techniques at his disposal but that he will 


be enabled to use them intelligently. D. E. BARTON 


A Course in Multivariate Analysis. (Griffin’s Statistical Monographs and Courses, 
no. 2.) By M. G. Kenpatt. London: Charles Griffin and Company Ltd. 1957. Pp. 185. 
22s. 


An Introduction to Multivariate Statistical Analysis. By T. W. AnpERson. New 
York: John Wiley and Sons Inc.; London: Chapman and Hall Ltd. 1958. Pp. xii+ 
374. 100s. 


Some Aspects of Multivariate Analysis. (Indian Statistical Series, no. 1.) By 
S. N. Roy. New York: John Wiley and Sons Inc.; Calcutta: Indian Statistical 
Institute. 1957. Pp. viii+214. 64s. 


These three books, published almost simultaneously, are the first to appear that deal exclusively and 
broadly with multivariate analysis. There is little overlap between them. They complement each other; 
each is excellent in its way. 

Prof. Kendall’s book is the revised version of a set of lecture notes for a course given at the University 
of North Carolina, Raleigh, and also at the Virginia Polytechnic Institute, Blacksburg, in 1954, and 
again subsequently at the London School of Economics. The author says in the Preface: ‘Multivariate 
Analysis in statistics is apt to be a baffling subject, especially for those students who want to use it in 
solving practical problems but do not possess the time or the inclination to plumb the depths of the 
mathematical theory to which it leads. This course was prepared with practical applications very much 
in the foreground. In it I have tried to expound the essential concepts and techniques and have limited 
the mathematical treatment as much as possible. In the present stage of knowledge this is no loss. 
The analysis of multivariate material requires to an unusual degree that peculiar blend of insight and 
skill in probabilistic interpretation which characterises the statistician, and for which pure mathematics 
is no substitute.’ 

Prof. Kendall has achieved his aim with the skili that one would expect from him. He begins by 
explaining the notion of analysing a sample of observations taken from a multivariate population into 
its principal components. This leads to a discussion of factor analysis and the estimation of functional 
relationships, and then an account of canonical analysis. A descriptive account (without proofs) of 
some relevant sampling theory and other mathematical topics follows, some multivariate tests gene- 
ralizing those of ordinary analysis of variance are given, and finally there is an account of discriminatory 
analysis. All the main concepts and methods are illustrated by examples drawn from the literature— 
over twenty of them. These examples, with critical discussion, are the main strength of the book. They 
provide an answer to anyone who may ask: what is multivariate analysis all about, what is the use of it? 
Most of the discussions are persuasive. Your reviewer was particularly impressed by the beautiful 
treatment of a problem of regression with collinearities (pp. 71-3). At the other extreme, he was 
particularly unimpressed by an application of covariance analysis (p. 135)—but then, anyone who 
honestly tries to discuss multivariate analysis in the round, and not only the mathematical theory of it, 
can hardly fail to provoke dissent occasionally ! 

The book is easy to read, and makes only modest demands on the reader’s mathematical and 
statistical knowledge. It is likely to be particularly weleomed by persons who have already some slight 
or one-sided acquaintance with multivariate analysis and desire to see the whole subject in perspective 
and have key references. The text is photolithographed from typescript, and the cover is soft. 

Prof. Anderson’s book is also introductory, but with a different aim—not an appraisal of multivariate 
methods, but a coherent presentation of the mathematical theory. ‘This book has been designed 
primarily as a text for a two-semester course in multivariate statistics. It is hoped that the book will 
also serve as an introduction to many topics in this area to statisticians who are not students and will 
be used as a reference by other statisticians. For several years the book in the form of dittoed notes 
has been used in a two-semester sequence of graduate courses at Columbia University... .It is hoped 
that the more basic and important topics are treated here, though to some extent the coverage is 
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a matter of taste. Some of the more recent and advanced developments are only briefly touched on 
in the last chapter.’ 

The book has the appearance of being the fruit of many years’ labour. It is most carefully written, 
and every argument is set out in full detail and with elegance. The first half of the book (corresponding 
to the first semester’s course) treats of the following topics: the multivariate normal distribution, 
maximum likelihood estimation of the mean vector and covariance matrix, the sampling distribution 
of correlation coefficients, the T? statistic, and classification problems. The second half deals with: the 
Wishart and related distributions, various significance tests derived by the likelihood ratio principle, 
principal components, canonical analysis, and some distribution theory of characteristic roots in the 
null case. There is also the final review chapter mentioned above, an appendix on matrix theory, and 
a fine bibliography. An amusing indication of the difference in intention between this and the preceding 
book is their treatment of that old sore thumb, factor analysis. Prof. Kendall, trying to exhibit what 
people do, gets to the subject early and does the best he can for it. Prof. Anderson, trying to present 
a clear and not-too-difficult body of theory, relegates it to a niche among the recent and advanced 
developments. 

Undoubtedly Prof. Anderson’s book will long remain the standard textbook and work of reference 
for multivariate theory based on the normal distribution. If such a beautifully written and beautifully 
printed book can have any fault—of being ever so slightly too smooth—what a good fault, and how 
easily remedied by reading either of the other two works under review! 

About Prof. Roy’s book, one is tempted to say that it begins where Prof. Anderson’s leaves off. 
Literally that is untrue, as it is self-contained and presupposes no previous knowledge of multivariate 
theory. But it can certainly be described as a recent and advanced development, and its main object 
is outside Anderson’s scope. ‘This monograph does not by any means attempt to cover the entire area 
of multivariate analysis, or even a major part of it. Aside from certain basic notions and results due 
to Fisher, Hotelling, Mahalanobis, Kar] Pearson, Wilks, Wishart, Yule and some of their predecessors, 
which have now become current coin, this monograph is primarily concerned with those developments 
in multivariate analysis in which the author has been specially interested and with which he and some 
of his collaborators have been associated over several years. Part of the material presented here, as 
far as the author is aware, has not been published before, while the rest has been collected from papers 
by various workers in this sector including the author and his collaborators. It will be seen that in 
the monograph the statistical approach to different problems and the mathematical treatment of all 
such problems are uniform and perhaps somewhat individual, and that this applies to all specific results, 
no matter whether they are due to the author and his collaborators, or to other workers in 
the field or to both groups simultaneously.’ 

Prof. Roy begins by enunciating a principle of test construction somewhat different from the usual 
likelihood ratio method. (Anyone who can be saying something interesting about tests by as soon as 
page 6 of his book deserves applause.) With this he chooses test criteria for a number of standard 
normal-theory null hypotheses. The criteria can be expressed in terms of the characteristic roots of 
a determinantal equation—sometimes the largest root, sometimes both largest and smallest roots. 
Associated with each test are measures of departure from the null hypothesis, based on a concept of 
distance; in terms of these the author obtains bounds for the operating characteristics of the tests. 
He is then able to achieve his main objective, namely, to derive by inversion confidence bounds for 
these measures of departure from the null hypothesis. There is also a final chapter concerned with the 
analysis of categorical data (contingency tables). Work is evidently still in progress, and there are 
various references to further results to be presented in another monograph or in a second edition of this 
one. The book will interest many whose concern is theoretical research in statistics. 

F. J. ANSCOMBE 


Statistical Exercises. Part IT. Compiled by N. L. Jounson. Issued by Department of 
Statistics, University College, London. 1957. Pp. 107. 12s. 


Even if one keeps a careful record of the details of problems dealt with in statistical consulting work, 
one frequently finds that there is a dearth of suitable examples for teaching purposes. Research workers 
can often cope with the analysis of standard situations themselves, and are liable to consult a statis- 
tician only when some awkward deviation from the textbook pattern occurs. For this reason a graded 
series of standard exercises is always welcome. 

The present compilation by Dr N. L. Johnson is divided into four main sections. The first deals with 
the simpler analysis of variance techniques, and starts with tests for equality of variances and the 
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transformation of variables. Such matters as Latin squares, missing plot technique and cross-classi- 
fication with unequal frequencies are included here. 

Next, we have.a set of exercises on factorial experiments. Examples are given on confounding, 
fractional replication, split plots and incomplete blocks. 

The third section covers correlation and regression, including multiple and curvilinear regression. 
A useful feature is the provision of notes on the Choleski method of inverting a symmetric matrix. 

Finally, there are examples on a number of miscellaneous methods such as dosage-mortality tech- 
niques, discriminant functions, time series and curve-fitting. 

These exercises have been excellently chosen and are likely to be of great use in teaching the appli- 
cation of statistical methods, though some of the examples are probably too long for use in a practical 
class. However, a student taking a course in statistics wants the opportunity to work through a good 
range of different problems without having too much repetition, and without having to spend too 
much time on investigating the influence of awkward complications in the data. 

This volume makes a most welcome addition to the set of exercises on more elementary matters 
previously issued by the Department of Statistics at University College London. Students of statistics 
and their harassed teachers are now no doubt eagerly awaiting the supplement, promised in the preface, 


ini , i > » <! 
containing worked solutions or suggested methods of attack ! NORMAN T. J. BAILEY 


Tables of the Non-Central t-Distribution. By G. J. Resnixorr and G. J. LIEBERMAN. 
Stanford, California: Stanford University Press. 1957. Pp. 389. $12.50. 


Possibly the simplest application of the present tables lies in the problem of estimating the proportion, 
P, of a Gaussian population which exceeds a given fixed limit L. This proportion depends only on the 
ratio U=(L—y)/o, which may be estimated from a sample of n observations by the statistic 
u=(L—%)/s. It may be shown that ,/n u can be expressed as the ratio of (z + 6) to ,/w where d=,/n U, zis 
a unit normal deviate and w is distributed independently as y?/f with f= n — 1 degrees of freedom. Such a 
quantity has been termed a non-central t variable and this seems to be the natural extension of the usual 
way in which the central ‘Student’ t= zw-! is now defined. However, in tabulating the non-central distri- 
bution, with its practical applications in view, one soon finds that it is not sufficient to consider only 
moderate values of t. In the problem just mentioned, for instance, the non-centrality parameter é 
is proportional to ,/n and increases with the sample size. Since the mean of the ¢-distribution is ap- 
proximately 6, the major part of the distribution is not therefore confined (as it is in the central case) 
to moderate values. Accordingly, to secure more compact tabulation, Resnikoff and Lieberman have in 
the present volume used the argument «=tf-?. Against this quantity at intervals of 0-05 they tabulate 
both the probability density and the probability integral of t to four decimal places. This information 
is provided for 280 separate distribution curves defined by 28 values of f, viz. f= 2(1)24 and 24(5) 49, 
each associated with 10 values of )—or rather 10 values of 6(f + 1)-3, viz. 0-674490, 1-036433, 1-281552, 
1-514102, 1-750686, 1-959964, 2-326348, 2-652070, 2-807034 and 3-090232. It will be observed that 
these latter are standard multiples corresponding to simple fractions of the Gaussian distribution and 
indeed, in application to the problem described above, correspond directly to P = 0-25, 0-15, 0-10, 0-065, 
0-04, 0-025, 0-01, 0-004, 0-0025 and 0-001. 

In a short table at the end of the volume certain percentage points of the 280 distributions are also 
provided, obtained by inverse interpolation from the probability integral. In this connexion the 
authors state that, where comparisons are possible, there is satisfactory agreement with the percentage 
points tabled by Johnson and Welch in Biometrika, 31. (These were calculated by means of an 
asymptotic expansion from the starting point that (t—6)(1+/#2/2f)-? has approximately a standard 
Gaussian distribution). 

In their introductory matter Resnikoff and Lieberman describe the applications of their tables to 
the familiar problems associated with the estimation of proportions of a Gaussian population and with 
the calculation of power functions of statistical tests. They include also reference to sequential ratio 
tests and to the calculation of certain expectations of functions of t which have occurred in the literature 
of sampling inspection. 

Comprehensive tables of the density function and of the probability integral of the non-central 
t-distribution have not been given before, and the volume under review represents a most valuable 
advance in the tabulation of a function which confronts the numerical worker with a number of rather 
difficult problems. 


B,. L. WELCH 
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Statistické Tabulky (Statistical Tables). Edited by JarosLav Janko. Prague: Cesko- 
slovenské Akademie Véd (Czechoslovakian Academy of Sciences). 1958. Pp. 251. 
Price 18.20 kés. 


This publication contains forty statistical tables, occupying 140 pages. There are 108 pages of contents’ 
index and introduction, and, at the end of the book, a small glossary index, giving English and Russian 
equivalents for Czech statistical terms used in the introduction. 

The method of arrangement in groups of related tables follows that used in Biometrika Tables for 
Statisticians. The format is similar, too, so that, particularly when looking at the introduction, there 
is a feeling of familiarity which unfortunately disappears on trying to read the text. 

Tables 1-8 (Normal, y? and ¢ distributions) are more or less familiar to most statisticians. The Normal 
tables are rather more concise than usual, and there are no dosage-mortality tables, but the tables 
of x? significance limits (based on the work of A. Hald and 8. A. Sinkbaek) are unusually extensive, 
and there is also a table of significance limits for vy? + (degrees of freedom). 

Tables 9-13 (non-central ¢, central and non-central F distributions) include charts of power functions 
of certain t- and F’-tests, and are reproduced from Biometrika sources, as are also Table 15 (charts for 
confidence intervals for the correlation coefficient), Tables 19—22 (distribution of range and of student- 
ized range) and Table 25 (studentized extreme deviate). Also, Tables 29-31 (Poisson distribution) 
have close parallels in Biometrika Tables for Statisticians. 

Table 14 gives percentage points of the distribution of correlation coefficient reproduced from Fisher 
and Yates’s Tables, while Table 32, giving values of 2 sin-! /x is rather more extensive than, but 
similar to, another table in this publication. 

Table 16 gives tables of significance limits for,Cochran’s criterion (max (s)/s?), reproduced from 
Selected Techniques of Statistical Analysis; Table 35 (tests for randomness of grouping in a sequence) 
also comes from this source. Other tables from American sources include Tables 23-27 (outlier criteria) 
and 34 (tolerance limits for a Normal distribution). 

Tables 17—18 reproduce Hald’s tables for estimating parameters of truncated and censored Normal 
distributions. Table 28 gives confidence limits for a binomial proportion—an unusually extensive table 
—and Table 33 distribution-free limits for the median (acknowledgement is made to K. R. Nair). 

The final group of tables (37-40) from American, Hungarian and Russian sources, give significance 
limits for various ‘Kolmogorov’ criteria, based on comparisons of observed with theoretical cumulative 
distribution functions, or with each other. 

So far as could be gleaned from a very rough translation, the introductory text gives categorical 
directions for applying the various techniques, with but little scope for individual judgement. However, 
most English-speaking readers will value the book mainly for the special tables—particularly nos. 8 
(x? + (degrees of freedom)), 16—18, 28, 34 and 37-40—contained in it. The quality of paper and binding 
is not as good as would be desirable in a volume which is likely to be handled fairly frequently. There 
are also a number of misprints—rather more than usual in a book of this type. Examples are: 
A(t, to, €) for A(f, to, €) on p. 135, several misplaced decimal points in Table 17, and ¢,, for t, or 4%, on 
pp. 39-40 in the introduction. However, none of the misprints found were really misleading, or more 


than minor nuisances. N. L. JOHNSON 


Tables of Integrals and other Mathematical Data (third edition). By Hrrsrrr B. 
Dwicut. New York and London: The Macmillan Company. 1957., Pp. 288. 21s. 


For those research workers in subjects where mathematical techniques are used as a tool, as a means 
to an end instead of an end in itself, this third edition of condensed information is valuable. Many 
statisticians use and value the tables of Smithsonian mathematical formulae, and this present book, 
which was first in the field, covers in algebra and trigonometry much the same ground. The supple- 
mentary numerical tables, given in most cases to only four decimal figures, are useful for quick 
calculations. These tables are of the usual trigonometric functions, the hyperbolic functions, natural 
logarithms and Bessel functions. 
The book is useful for reference purposes and should be bought by statistical libraries. 


F. N. DAVID 
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Fractional Factorial Experiment Designs for Factors at Two Levels. U.S. Dept. 
of Commerce: National Bureau of Standards, Applied Mathematics Series no. 48. 
1957. Pp. ii+ 85. 50 cents. 


It is well known that the use of factorial designs in experimental work may involve the collection of 
a very large number of observations (or measurements). It is natural therefore that attention should 
be paid to the possibility of the collection of fewer measurements, and the statistical technique which 
has emerged is that known as ‘fractional’ factorial experimental designs. This booklet lists 125 experi- 
mental designs of this kind. It will be useful to everyone concerned with this particular type of 
statistical analysis. ¥. N. DAVID 


Vector Analysis. By L. Branp. New York: John Wiley and Sons, Inc.; London: 
Chapman and Hall Ltd. 1957. Pp. 299. 48s. 


This book gives the reader an excellent introduction to vector algebra, vector geometry and vector 
calculus and this introduction is complemented by three well-thought out chapters on applications of 
these topics to dynamics, fluid mechanics and electrodynamics. 

The first five chapters form the content of a good course on vector analysis for physicists and 
chemists, and also give the applied mathematician an insight into this valuable tool at the level of 
the first or second (honours) year. The book work here is clear and the emphasis sound: it includes 
kinematics, differential geometry and the integral theorems of Green and Stokes. Interlaced with the 
text are relevant and not too taxing examples. One imagines that a worked example followed by 
exactly the same question in the problems is a minor aberration in documenting the material: see 
example 1 on page 27, together with problem 3 on page 29. 

The applications are necessarily less scholarly, since they treat complex physical disciplines in all 
too brief chapters. Dynamics occupies 18 pages and we get little more than statements on rigid body 
problems. The section on the solar system is pleasing. Electrodynamics treated in Lorentz—Heaviside 
(the 47’s occasionally disappear as on page 211) and Georgi units is again too short except as a revision 
course in electrostatics and magnetostatics. 

The final chapter is a good attempt at introducing the young student to the ideas of linear vector 
spaces including Hilbert space at an early stage in his career. E. A. POWER 


Linear Algebra for Undergraduates. By D.C. Murpocu. New York: John Wiley and 
Sons Inc.; London: Chapman and Hall Ltd. 1957. Pp. xi+ 239. 44s. 


This textbook has been written for students reading mathematics and the physical sciences. They are 
supposed to have reached a standard something like the G.C.E. Advanced Level in this country, but 
not yet to be ready for the mathematically sophisticated point of view taken in modern abstract 
algebra. 

The topics covered are best described by listing the chapter headings: 

(1) Vectors and vector spaces. 

(2) Matrices, rank, and systems of linear equations. 

(3) Further algebra of matrices. 

(4) Further geometry of real vector spaces. 

(5) Transformations of co-ordinates in a vector space. 

(6) Linear transformations in a vector space. 

(7) Similar matrices and diagonalization theorems. 

(8) Reduction of quadratic forms. 

(9) Vector spaces over the complex field. 

One has the impression throughout that the text has beeen carefully written and clearly displayed, 
and that it reflects the experience of a good teacher. As some of the chapter headings indicate, there 
is a development by stages which match the reader’s growing feeling for the subject. 

Vectors are introduced as n-tuples of numbers; the abstract definition is explained in one of two 
appendices. Determinants of order n are defined and used, but the reader is referred elsewhere for the 
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set of simple properties related to the performance of elementary operations on the rows and columns. 
Three-dimensional position vectors are much used for illustration; for those who need it, the second 
appendix briefly sets out the elements of co-ordinate geometry of three dimensions. 

This book should appeal to many students in this country. They will find exercises which are mostly 
straightforward and sometimes indicate further theoretical ideas. Answers are provided. There is 


a good index. J. W. ARCHBOLD 


Numerical Analysis. Vol. v1. (Proceedings of 6th Symposium in Applied Mathematics 
of the American Mathematical Society held 1953.) Edited by Jonn H. Curtiss. 
New York, Toronto and London: McGraw Hill Book Company Inc., for The American 
Mathematical Society. 1956. Pp. 303. 73s. 


This volume contains nineteen papers out of the twenty-one originally given at the Symposium on 
numerical analysis held at Santa Monica in August, 1953. The papers range widely in subject-matter, 
weight and formality of presentation. 

Questions of approximation of functions are treated by Wasow in a paper giving conditions under 
which a variate can be asymptotically expanded in terms of a normal variate; by Walsh in an expository 
paper on best-approximation polynomials over finite point sets; and by Hastings et al. in a note on 
the practical approximation of functions, which gives some interesting examples, including a rationai 
approximation to the normal integral serviceable for all positive values of the argument. A paper 
by Sard, on function spaces and approximation, bears on integral representations of remainders. 

Methods for solving sets of linear equations figure in three contributions. Hestenes deals with the 
conjugate-gradient method and its relation to n-step iterative methods generally. Young’s paper on 
iterative methods is largely expository. Fischbach applies the steepest descent and conjugate gradient 
methods to linear sets and also to differential equations. An extensive general paper on the method 
of steepest descent is given by Rosenbloom. 

Papers on the assignment problem, on the application of high-speed digital computers to problems 
whose variables are permutations, and on computational problems in the theory of dynamic program- 
ming are contributed by Motzkin, Tompkins and Bellman, respectively. A useful section of Tompkins’s 
paper is on the systematic generation of permutations. 

Summarizing the other contributions, there are three papers (Bruck, Emma Lehmer, Olga Taussky) 
on number theory, one (Warschawski) on conformal mapping, one (Wielandt) on error bounds for 
eigenvalues of symmetric integral equations when obtained by numerical quadrature, and three 
(Bergman, Clutterham and Taub, Frankel) on different problems of numerical solution of partial 
differential equations. 

The book is well produced and has a good index. vt. LEWIS 


Note regarding 
Bibliography of nonparametric statistics and related topics 


Dr I. R. Savage is arranging for a revision of this Bibliography which was published in 1953 in the 
Journal of the American Statistical Association, 48, 844-906. Material up to and including publications 
of the year 1959 will be incorporated and it is planned to lay more emphasis on applications than before. 
References (particularly to literature not in the English language), reprints and technical reports on 
the theory or applications of nonparametric statistics would be greatly appreciated. Also, corrections 
and additions to the original Bibliography are desired. 

Material should be sent to: 


I. RicHarp SAVAGE, 
Statistics Department, University of Minnesota, 
Minneapolis, Minnesota, USA 
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TRACTS FOR COMPUTERS 


Department of Statistics, University College, London 


I. Tables of the Digamma and Trigamma Functions. By ELEANOR PAIRMAN, M.A, 


* 1 9, 2 . 
Tables for summing S = 2 nitanaiie) Oita where the p’s and q’s are numerical 





factors. Price 5s. net. 


V. Table of Coefficients of Everett’s Central-Difference Interpolation Formula. By A. J. 
THOMPSON, PH.D. Second edition. Price 7s. 6d. net. 


VIII. Table of the Logarithms of the Complete [-Function (to ten decimal places) for 
Argument 2 to 1200 beyond Legendre’s Range (Argument 1 to 2). By EGon S. PEARSON, 
D.Sc. Price 5s. net. 


IX. Log [ (x) from x=1 to 50-9 by intervals of 0-01. By JoHN BROWNLEE, M.D., D.Sc. 
Price 5s. net. 


X. On Quadrature and Cubature or on Methods of Determining Approximately Single 
and Double Integrals. By J. O. Inwin, D.Sc. Price 7s. 6d. net. 


XII. Tables of the Probable Error of the Coefficient of Correlation. By KARL HoLzINcer, 
Pu.D. Price 5s. net. 


XIII. Bibliotheca Tabularum Mathematicarum, being a Descriptive Catalogue of Mathematical 
Tables. Part I. A, Logarithms of Numbers. By James HENDERSON, PH.D. Price 9s. net. 


XV. Random Sampling Numbers. By L. H. C. Tippett, M.Sc., with a Foreword by KARL 
PEARSON. Price 5s. net. 


: 


Tables of tan-'x and log(1+x*). To assist in the calculation of the ordinates of a Pearson 
Type IV curve. By L. J. Comrigz, PH.D. Price 5s. net. 


: 


Random Sampling Numbers (2nd Series). By M. G. KENDALL and B. BABINGTON SMITH. 
Price 5s. net. 


XXV. Random Normal Deviates. By HERMAN WOLD. Price 5s. net. 


XXVI. Correlated Random Normal Deviates. By E. C. Frecter, T. Lewis and E. S. PEARSON. 
Price 10s. 6d. net. 


Nos. II, III, IV, VI and VII are out of print 





> 
LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’s 
Arithmetica Logarithmica). 
The nine separate sections of this Table have now been issued, and the complete work 
consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 
General Introduction (98 pp.) is available in two bound volumes. 


Price £8. 8s. od. 
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